In the rapidly evolving landscape of intelligent manufacturing, industrial sorting processes face nonlinear complex constraints such as multi-target recognition, high-dynamic environments, and multi-modal interactions. Computer vision-driven intelligent perception mechanisms have become a key technological support for building closed-loop collaborative cyber-physical systems. This paper focuses on the integration of industrial informatization scenarios, emphasizing deep learning visual modeling, digital twin trajectory optimization, and self-healing compliant control to construct an industrial AI robot sorting system with autonomous perception and dynamic execution capabilities. The AI robot system is designed to handle heterogeneous objects in complex working conditions, ensuring high robustness and adaptability.
Computer vision aims to achieve high-precision target detection and classification through image acquisition, feature extraction, pattern recognition, and deep learning algorithms. In industrial sorting tasks, computer vision is primarily used for target recognition, pose estimation, defect detection, and sorting optimization. During the image acquisition phase, the system utilizes high-resolution industrial cameras to capture RGB or multi-spectral images. Image preprocessing includes gamma correction, histogram equalization, and noise filtering to ensure the accuracy of subsequent feature extraction. In the feature extraction stage, algorithms such as Scale-Invariant Feature Transform (SIFT), Speeded-Up Robust Features (SURF), and Histogram of Oriented Gradients (HOG) are employed to extract geometric features, color information, and texture patterns of target objects. Pattern recognition adopts deep learning frameworks to achieve target detection, defect identification, and classification optimization, making it suitable for high-speed conveyor belt sorting tasks. The integration of these technologies enables the AI robot to perceive and interpret its environment effectively.

The overall architecture of the intelligent AI robot sorting system consists of three core functional units: the computer vision module based on deep learning, the adaptive robot path planning module based on digital twins, and the multi-modal robot intelligent grasping module based on self-healing grasping. The vision module is responsible for target recognition and feature extraction, achieving precise target perception in complex environments through multi-scale fusion and attention mechanisms. The path planning module uses digital twin technology to build a virtual-physical integrated model, combining dynamic feedback and task switching strategies to generate real-time optimal motion trajectories. The intelligent grasping module relies on a six-axis robotic arm and self-healing strategies, adjusting posture and gripping force based on multi-modal feedback to ensure grasping stability. These three modules operate synergistically to achieve closed-loop control from perception, decision-making to execution. This architecture ensures that the AI robot can handle dynamic changes in the sorting process efficiently.
The computer vision module design based on deep learning is crucial for the AI robot’s perception capabilities. Feature screening primarily relies on the Convolutional Neural Network (CNN) framework, combining multi-scale receptive fields, attention mechanisms, and pooling strategies to achieve feature extraction and screening optimization. In the feature screening process, the CBR (Convolution-Batch Normalization-ReLU) module is used to extract initial features, and dilated convolutions (SConv) expand the receptive field to enhance the capture of target edges and texture details. Subsequently, a parallel pooling mechanism is introduced, using MaxPool to retain local extreme value information, MeanPool to reduce noise interference, and global average pooling (AvgPool) to calculate global feature distribution, improving the robustness of feature expression. The global average pooling operation is defined as:
$$ F_{Avg}(c) = \frac{1}{H \times W} \sum_{i=1}^{H} \sum_{j=1}^{W} F_{ij}(c) $$
where \( F_{Avg}(c) \) is the global feature mean on channel \( c \), \( H \) and \( W \) are the height and width of the feature map, respectively, and \( F_{ij}(c) \) is the feature value at position \( (i, j) \) on channel \( c \). This operation effectively integrates global information and improves the model’s generalization ability. To enhance the accuracy of feature screening, a channel attention mechanism is introduced, with weight normalization to enhance key features and suppress redundant information. Its calculation formula is:
$$ \alpha_c = \frac{\exp(F_{Avg}(c))}{\sum_{k=1}^{C} \exp(F_{Avg}(k))} $$
where \( \alpha_c \) is the normalized weight of channel \( c \), and \( C \) is the total number of channels. This mechanism adaptively allocates weights to different feature channels, allowing the model to focus more on the salient features of sorting targets. Finally, feature mapping compression is performed through 1×1 convolution to improve the computational efficiency of feature screening. This approach ensures that the AI robot can accurately identify and classify objects in real-time.
In industrial AI robot intelligent sorting scenarios, target size, shape, and texture features are highly variable, making single-scale feature extraction inadequate for complex working conditions. Therefore, a multi-scale fusion mechanism is constructed to fully utilize local details and global information to improve target sorting accuracy. Multi-scale feature fusion is primarily based on deep convolutional neural networks that use convolutional layers with different receptive fields to extract feature maps, and fusion is achieved through cross-scale feature interaction and cascading mechanisms. Let the input image be \( I \), and feature maps at different scales be denoted as \( F_s^l \) (where \( S \) represents the scale and \( l \) represents the network layer). The fused multi-scale feature is expressed as:
$$ F_{MSFF} = \sum_{s=1}^{S} w_s \cdot g(F_L^s) + \sum_{l=1}^{L} v_l \cdot h(F_l^S) $$
where \( F_{MSFF} \) is the final fused feature, \( S \) is the number of scales, \( L \) is the number of layers, \( w_s \) and \( v_l \) are scale weights and layer weights, respectively, and \( g(\cdot) \) and \( h(\cdot) \) are scale normalization and channel attention mechanisms. Equation (3) performs weighted superposition of cross-scale features, guiding the network to focus on discriminative information at different levels, achieving synergistic optimization of global structure and local details, and ensuring the separability of small-scale targets in deep semantic spaces. This multi-scale approach enhances the AI robot’s ability to handle objects of varying sizes and orientations.
The adaptive robot path planning module based on digital twins constructs a three-dimensional integrated architecture of “physical layer-mapping data layer-virtual layer,” relying on digital twin technology to achieve dynamic path reconstruction. The physical layer devices include touchscreen HMI, controller, servo drive module, stepper driver, photoelectric sensors, torque sensor, and industrial camera. The mapping data layer uses OPC UA as the core protocol to model virtual mappings of multi-source data. The G120 module outputs conveyor belt start/stop speed, the DM860 driver provides stepper start/stop position information, IN16X sensors feedback blocking/positioning cylinder signals, IB20-1700 transmits end-effector six-dimensional force data, and IS2000 collects workpiece color offset information. Through data standardization and label binding, a state quantity mapping vector is constructed and uniformly input into the digital twin control engine. The virtual layer uses improved rigid body dynamics modeling to construct a coupled multi-joint state space. The core equation for path planning is:
$$ M(q) \ddot{q} + C(q, \dot{q}) \dot{q} + G(q) + \mu(t) \text{sgn}(\dot{q}) = \tau $$
where \( q \) is the joint position vector, \( \dot{q} \) is the joint velocity, \( \ddot{q} \) is the joint acceleration, \( M(q) \) is the joint inertia matrix reflecting system mass coupling, \( C(q, \dot{q}) \) is the Coriolis and centrifugal force matrix describing nonlinear velocity disturbances, \( G(q) \) is the gravity term vector, \( \tau \) is the joint driving torque, \( \mu(t) \) is the time-dependent friction compensation factor modeled and updated online by particle swarm optimization, and \( \text{sgn}(\dot{q}) \) is the joint velocity sign function. The system virtualizes the G120 output as a conveyor transmission body in the 3D twin environment, models stepper control and cylinder behavior as rigid units such as sliding pairs/displacement pairs, and drives the virtual gripper compliance modeling with six-dimensional torque data, forming a complete dynamic interaction model. Path planning logic is processed by 5G-MEC edge-cloud collaboration, which performs edge fusion processing on physical layer feedback information to generate real-time trajectory optimization solutions. A dynamic task switching strategy is embedded to reconstruct trajectories based on order rhythm and workpiece position changes, achieving task-level path autonomous evolution. This digital twin approach enables the AI robot to adapt to changing environments seamlessly.
The multi-modal robot intelligent grasping module based on self-healing grasping is essential for the AI robot’s execution phase. The robot mechanical arm design adopts a six-degree-of-freedom serial structure. Joints 1 to 6 use high-precision harmonic reducers and servo motors to ensure high-precision motion control. The base (joint 1) achieves 360° rotation to adjust the overall operation direction, equipped with an absolute encoder to provide global positioning accuracy. The shoulder (joint 2) and elbow (joint 3) control the forward and backward swing of the mechanical arm, using AC servo motors combined with RV reducers to improve load capacity and stiffness. The wrist (joint 4) can rotate ±180° to adjust the grasping posture and integrates a Hall effect angle sensor for closed-loop control. Joint 5 is used for pitch adjustment with a maximum rotation angle of ±135°. Joint 6 drives the end-effector rotation, and its symmetrically structured harmonic reducer improves torque output stability. This design ensures that the AI robot can perform precise and stable grasping actions.
The self-healing grasping strategy relies on a multi-modal feedback loop and a virtual-physical linkage structure, building a grasping quality evaluation index system with contact area ratio (\( \eta \)), slip amount (\( \delta \)), and deformation coefficient (\( \epsilon \)) as the core. When real-time detection satisfies \( \eta < 0.8 \) or \( \delta > 1.5 \) mm, the system enters the compensation control state, and the 3D digital twin body starts the reverse simulation engine to generate posture adjustment parameters to optimize the execution mechanism set. The grasping quality function is defined as:
$$ Q = \omega_1 \cdot \eta + \omega_2 \cdot e^{-\delta} + \omega_3 \cdot (1 – \epsilon^2) $$
where \( Q \) is the comprehensive grasping quality evaluation value, and \( \omega_1 \), \( \omega_2 \), \( \omega_3 \) are normalized weighting factors satisfying \( \sum \omega_i = 1 \). The system dynamically determines grasping stability based on the Q function and schedules the backup adsorption module via the MEC node, implementing non-rigid contact mechanism switching, and corrects the gripping posture with a tactile sensor array, ultimately achieving the closed-loop self-healing logic of perception-evaluation-compensation-execution. This strategy enhances the AI robot’s ability to handle fragile or irregular objects without damage.
Testing and analysis were conducted to evaluate the performance of the AI robot sorting system. The experiment was carried out in an intelligent manufacturing laboratory, with target objects covering regular rigid workpieces (metal gears, electronic components) and irregular flexible items (packaging bags, soft plastics). The sorting task was based on a high-speed industrial conveyor belt (speed of 1.35 m/s), tested under different lighting intensities (300 lx, 600 lx, 900 lx) and different object placement methods (single layer, stacked, tilted) to measure system accuracy. The traditional system (based on static path planning and fixed force control strategy) was compared with the proposed system. Specific tests included: (1) Target detection: Industrial cameras capture images and output target category, position, and posture information. (2) Path planning: The traditional system uses the Dijkstra algorithm for static path planning, while the proposed system combines reinforcement learning and digital twin systems to achieve dynamic trajectory optimization. (3) Grasping execution: The traditional system uses mechanical grippers with preset clamping force for fixed-force grasping, while the proposed system adjusts gripping force and end-effector mode based on self-healing grasping strategy and deep learning model predicting target material. (4) Sorting placement: The robot places the target object in a designated area, and key indicators such as grasping success rate, mis-grasp rate, and average execution time are calculated and recorded.
| Test Metric | Traditional System | Proposed System |
|---|---|---|
| Sorting Efficiency (items/min) | 18.42 | 26.73 |
| Grasping Success Rate (%) | 82.31 | 94.67 |
| Mis-grasp Rate (%) | 6.28 | 2.94 |
| Average Execution Time (s) | 3.17 | 2.21 |
The test results demonstrate the superiority of the proposed AI robot system. In terms of sorting efficiency, the proposed system achieves 26.73 items/min, higher than the traditional system’s 18.42 items/min, indicating that the proposed system effectively reduces redundant movements and improves operation rhythm. For grasping success rate, the proposed system achieves 94.67%, higher than the traditional system’s 82.31%, showing that the self-healing grasping strategy enhances adaptability to objects of different materials and shapes. Regarding mis-grasp rate, the proposed system reduces it to 2.94%, indicating that adaptive control based on the digital twin system effectively reduces misoperations. The average execution time of the proposed system is also shortened to 2.21 s, proving that digital twin path planning reduces unnecessary path switching and improves task execution efficiency. These results highlight the effectiveness of the AI robot in real-world industrial applications.
In conclusion, this paper constructs an intelligent AI robot sorting system architecture that integrates feature extraction, path planning, and self-healing grasping. Based on computer vision, multi-scale feature fusion is achieved, combined with digital twin simulation to realize dynamic path planning, and through self-healing strategies, grasping stability and structural robustness are improved. The research results show that the designed system exhibits high efficiency, adaptability, and intelligent collaborative control capabilities in industrial heterogeneous environments. The AI robot system represents a significant advancement in automated sorting technology, paving the way for more flexible and resilient manufacturing processes. Future work will focus on enhancing the AI robot’s learning capabilities through reinforcement learning and expanding its application to more complex scenarios.