Vision-Guided Target Stacking Technology for Controllable Metamorphic Palletizing Robots

In modern industrial automation, palletizing robots play a crucial role in material handling and logistics. However, these robots often face challenges in accurately identifying and grasping diverse materials due to complex stacking scenarios, where visual overlaps between targets and the robot structure can lead to low success rates. Traditional control methods, such as single PID techniques, struggle to adapt to varying connection forms in the servo control structure of robotic arms under different stacking conditions. To address these limitations, we propose a novel vision-guided target stacking technology for controllable metamorphic palletizing robots. This approach integrates high-precision visual recognition, dynamic modeling, PID feedback control, and intelligent trajectory planning to enhance stacking accuracy and efficiency. By leveraging advanced robot technology, our method ensures robust performance in single-target, simple, and complex stacking environments, achieving over 90% success rates in challenging scenarios.

The core of our robot technology involves a vision-guided system that captures real-time images of stacking materials using high-resolution cameras. These images are processed through machine vision software to extract critical information such as material type, position, and orientation. This visual data guides the robotic arm’s movements, enabling precise target identification and reducing visual overlaps. We analyze the connection forms of the servo control structure under visual guidance to minimize interference and construct a dynamic model for stacking operations. Combined with a PID feedback controller, this model allows for exact control of the robot’s stacking behavior. Furthermore, we define target stacking samples, create pose point datasets, and apply data enhancement techniques to improve recognition accuracy. By evaluating grab priorities and determining auxiliary stacking relationships, we plan optimal robot trajectories, ensuring efficient and reliable stacking. Experimental results demonstrate that our robot technology achieves a 100% success rate in single-target and simple stacking scenarios and exceeds 90% in complex environments, highlighting its practical applicability in industrial settings.

To elaborate on the vision-guided system, we utilize industrial cameras with high resolution and anti-glare capabilities to capture 2D images and depth information of the stacking area. The image processing software employs 3D vision algorithms and intelligent trajectory planning to derive accurate grasping, placement, and trajectory points. This robot technology enables the robotic arm to perform rapid and precise movements. The motion control platform translates these points into actionable commands, coordinating the robot’s joints and end-effectors. The visual guidance expression can be represented as:

$$ Q = \frac{w_{\text{max}} – w_{\text{min}}}{\omega – 1} + \xi q W $$

where $ w_{\text{max}} $ and $ w_{\text{min}} $ are the maximum and minimum positioning parameters of the target stacking material, $ \omega $ is the path planning coefficient, $ W $ is the positioning vector of the grab sample point, $ \xi $ is the image participation parameter, and $ q $ is the preprocessing feature of the visual image. This equation ensures that the robotic arm adapts to dynamic environments, a key aspect of advanced robot technology.

Next, we design the servo control structure for the visually guided robotic arm. As shown in the control model, a visual camera detects the target’s position and converts coordinate parameters into the guidance coordinate system. The orientation angles of the target stacking object are integrated with the initial material position to plan the stacking trajectory. A pose controller and servo encoder serve as core modules, generating detection instructions based on the target’s location. The relationship between the projected sample point coordinates $ (X_1, Y_1, Z_1) $ and the real coordinates $ (X_0, Y_0, Z_0) $ is given by:

$$
\begin{cases}
X_1 = X_0 \sin \sigma_X \\
Y_1 = Y_0 \cos \sigma_Y \\
Z_1 = Z_0 \tan \sigma_Z
\end{cases}
$$

where $ \sigma_X $, $ \sigma_Y $, and $ \sigma_Z $ are the angles between the stacking trajectory and the axes of the guidance coordinate system. Combining this with the visual guidance expression, the standard definition of the servo control structure is:

$$ E = Q \times \frac{e_X e_Y e_Z}{X_1 Y_1 Z_1} $$

Here, $ e_X $, $ e_Y $, and $ e_Z $ are the pose vectors of the robotic arm along the respective axes. This formulation allows real-time adjustment of the arm’s motion, ensuring accurate stacking actions through sophisticated robot technology.

For dynamic modeling of the stacking operation, we consider factors such as robot mass, inertia, damping, and external forces. These elements influence the robot’s stability and precision during stacking. The dynamics equation, based on Newton’s second law and the angular momentum theorem, is expressed as:

$$ r \times u = E (F_{\text{gravity}} + F_{\text{drive}} – F_{\text{friction}} – F_{\text{air}} – R) $$
$$ \psi \times \alpha = M_{\text{drive}} – M_{\text{friction}} – M_{\text{air}} $$

where $ r $ is the robot mass, $ u $ is the acceleration, $ \psi $ is the inertia parameter, $ \alpha $ is the angular acceleration, $ R = c v $ is the damping vector (with $ c $ as the damping coefficient and $ v $ as the velocity), and $ U $ represents external forces. The terms $ F_{\text{gravity}} $, $ F_{\text{drive}} $, $ F_{\text{friction}} $, and $ F_{\text{air}} $ denote gravitational, drive, frictional, and air resistance forces, respectively, while $ M $ terms represent corresponding moments. Assuming frictional and air resistances are proportional to velocity, i.e., $ F_{\text{friction}} = \lambda_1 v $ and $ F_{\text{air}} = \lambda_2 v $, the simplified equation becomes:

$$ r \times u = E (F_{\text{gravity}} + F_{\text{drive}} – (\lambda + c) v) $$

This model captures the essential dynamics of the robot technology, enabling precise control over stacking tasks.

To enhance control accuracy, we incorporate a PID feedback controller that adjusts the robot’s stacking trajectory based on differential, integral, and proportional operations. The controller’s closed-loop structure processes dynamic model inputs to compute a core control variable, which is then applied to the robot’s motion. The PID controller definition is:

$$ T = \left( \frac{o_1 o_2 o_3}{p_1 p_2 p_3} \right) \times \frac{\psi}{\zeta} $$

where $ o_1 $, $ o_2 $, and $ o_3 $ are the differential, integral, and proportional control coefficients, respectively, $ p_1 $, $ p_2 $, and $ p_3 $ are control vectors for different stacking trajectory stages, $ \psi $ is the PID adjustment parameter, and $ \zeta $ is the feedback adjustment parameter. This controller ensures that the robot technology responds swiftly to errors, maintaining high precision in various stacking scenarios.

For target stacking pose recognition, we define stacking samples as teach points that guide the robot’s movements. By demonstrating a single stacking point, the control system can define a complete linear trajectory through offset processing. The teach equation for a linear trajectory is:

$$ F(a’) = \frac{\theta S_{a’}}{d} $$

where $ a’ $ is a random stacking point in the linear trajectory, $ \theta $ is the offset amount, $ S_{a’} $ is the linear planning vector, and $ d $ is the standard teaching behavior vector. The standard definition of the target stacking sample point is:

$$ A = d \times \frac{\nu}{\Theta} \times F(a’) \times \frac{1}{a_1 a_2 \cdots a_n} $$

Here, $ d $ represents the stacking point pattern, $ \Theta $ is the programming coefficient for robot motion behavior, $ \nu $ is the planning parameter, and $ a_1, a_2, \ldots, a_n $ are non-overlapping trajectory nodes. This approach guarantees that each linear path contains only one teach point, simplifying the robot technology’s operation.

We then create a pose point dataset containing 3D coordinates, pose information, and other relevant parameters for stacking targets. This dataset is crucial for training and evaluating the robot’s recognition and grasping capabilities. The dataset definition is:

$$ \Delta = \left\{ \frac{A \times \mu^2}{D f g j – h^2} \right\} $$

where $ \mu $ is the 3D coordinate parameter, $ f $ is the pose sampling parameter, $ g $ and $ j $ are rotation and translation parameters of the robotic arm, $ D $ is the size and shape特征 of the material, and $ h $ is a random pose point object. The dataset includes various parameters, as summarized in Table 1.

Table 1: Partial Pose Point Dataset
Sample Point Index	X Coordinate	Y Coordinate	Z Coordinate	Rotation Parameter (X-axis)	Rotation Parameter (Y-axis)	Translation Parameter (X-axis)	Translation Parameter (Y-axis)	Material Size (L×W×H)
1	100	200	300	0°	0°	0	0	50×50×50
2	150	200	300	0°	0°	50	0	60×60×60
3	100	250	300	0°	0°	0	50	55×55×55
…	…	…	…	…	…	…	…	…

To improve recognition accuracy, we apply data enhancement techniques, including rotation, scaling, translation, brightness adjustment, and noise addition. These processes simulate various environmental conditions, enhancing the robot technology’s adaptability. The enhancement expression is:

$$ G = J \times \frac{H_1 + H_2 + \cdots + H_n}{n!} \times \frac{k \iota}{j \iota – 1} $$

where $ H_1, H_2, \ldots, H_n $ are non-overlapping pose point parameters, $ \iota $ is the scaling coefficient, $ j\iota $ and $ k\iota $ are brightness and contour recognition vectors, and $ J $ is the distance between the material and the target stacking sample point. This step ensures robust performance in real-world applications, a hallmark of advanced robot technology.

For grab priority assessment, we evaluate stacking targets based on factors like Euclidean distance, weight, shape, and stability. The Euclidean distance between the visual camera’s imaging plane and the target material plane is calculated as:

$$ L = \frac{\sqrt{(x_1 – x_2)^2 + (y_1 – y_2)^2 + (z_1 – z_2)^2}}{G \times \Delta l} $$

where $ (x_1, y_1, z_1) $ and $ (x_2, y_2, z_2) $ are the normal vector coordinates of the camera imaging plane and target material plane, respectively, $ G $ is the enhancement parameter, and $ \Delta l $ is the relative distance. Using a weighted average method, the final grab priority is:

$$ P = \sum_{m=1}^{m_{\text{max}}} \omega_m L_m $$

where $ P $ is the grab priority, $ m_{\text{max}} $ is the number of factors considered, $ \omega_m $ is the weight of the $ m $-th factor, and $ L_m $ is the Euclidean distance for that factor. This assessment enables the robot technology to prioritize targets efficiently, reducing unnecessary movements.

After determining grab priorities, we identify target stacking points by establishing minimal bounding boxes around the materials. This ensures stable grabs and prevents collisions. The stacking point definition is:

$$ C = V \times N \times \kappa \times e^{-L^2 / 2b} $$

where $ V $ is the optimal包围 feature of discrete stacking points, $ N $ is the rectangular包围 parameter, $ \kappa $ is the planning coefficient under priority conditions, and $ b $ is the grab range control vector. The robotic arm’s end-effector is oriented perpendicular to already placed materials to maintain stability.

Finally, we plan the robot’s stacking trajectory using an algorithm that defines paths, generates smooth trajectories, avoids singularities, and allocates time efficiently. The trajectory generation formula is:

$$ M = \frac{C}{2} \times (N’ – N_0)^2 $$

where $ v $ is the average robot speed during stacking, $ N_0 $ is the initial position term, and $ N’ $ is the final position term. Simulation results verify that the trajectory is collision-free and smooth, as illustrated in the trajectory planning output. This comprehensive approach to robot technology ensures high efficiency and accuracy in stacking operations.

In experimental analysis, we set up a system comprising a projector, upper computer, industrial camera, and a six-axis palletizing robot with a pneumatic gripper. The gripper has a length of 30 cm and a maximum stroke of 15 cm, resulting in a working range of 15–45 cm. Materials weighing 5 kg are used, and parameters such as image resolution, edge detection thresholds, and stacking height are configured as shown in Table 2.

Table 2: Experimental Parameter Settings
Parameter Name	Parameter Value
Image Resolution	1920×1080
Edge Detection Threshold	50
Contour Extraction Threshold	70
Material Recognition Accuracy Range	Min: 5 cm, Max: 50 cm
Pneumatic Gripper Working Range	15–45 cm
Stacking Height	≤50 cm
Rotation Angle	±15°
Scaling Ratio	0.9–1.1
Translation Distance	±2 cm

The experimental steps involve calibrating the camera, capturing images, processing data, and conducting stacking tasks. Data enhancement techniques, including rotation at angles like 0°, 45°, and 90°, scaling with factors of 0.5 and 1.0, and brightness adjustment, are applied to improve recognition. The success rate is calculated as:

$$ \eta = \frac{\alpha_1}{\alpha_0} \times 100\% $$

where $ \alpha_1 $ is the number of successful experiments and $ \alpha_0 $ is the total number of experiments. Results show that in single-target and simple stacking scenarios, our robot technology achieves a 100% success rate, while in complex scenarios, it reaches 92%, outperforming comparative methods. For instance, in complex environments, traditional methods like instance segmentation and point-pair feature-based approaches show lower success rates due to difficulties in handling occlusions and variations. This underscores the superiority of our integrated robot technology in diverse industrial applications.

In conclusion, our vision-guided target stacking technology for controllable metamorphic palletizing robots represents a significant advancement in robot technology. By combining visual guidance, dynamic modeling, PID control, and intelligent planning, we achieve high precision and adaptability in stacking operations. Future work will focus on optimizing algorithms for real-time processing and expanding the technology to handle more diverse material types and environments. This innovation not only enhances industrial automation but also paves the way for smarter, more efficient robot technology in global logistics and manufacturing.