Design and Implementation of a Mobile Robotic System for the Assembly of Rotary Vector Reducers

The advancement of industrial automation demands robotic systems with ever-increasing precision, flexibility, and adaptability. A critical component in modern industrial robotics, particularly in articulated arms, is the rotary vector reducer. Its exceptional performance characteristics, including high torsional stiffness, compact design, and excellent positioning accuracy, make it indispensable. However, the assembly of the rotary vector reducer itself involves precise mating of components with tight tolerances, a process often reliant on skilled manual labor which can be inconsistent and costly. This work presents the design, control strategy, and experimental validation of a mobile robotic system developed to automate the assembly of key sub-components within a rotary vector reducer, specifically the support plate and the pin housing.

The core assembly task focuses on two primary components: the support plate and the pin housing (or pinwheel). The support plate features a central hub and multiple peripheral holes for crank shafts. The pin housing contains a corresponding central bore for bearings and an internal ring of pin gears. Successful assembly requires not only coaxial alignment of the central features but also the parallel alignment of their mounting faces and the precise angular alignment of the multiple crank shaft holes. This constitutes a challenging multi-degree-of-freedom alignment problem with sub-millimeter clearance requirements, highly sensitive to positional and angular errors.

Assembly Requirements and Technical Challenges

The assembly process for the rotary vector reducer support plate and pin housing presents distinct technical hurdles. The mating features involve a combination of a central pilot and distributed peripheral holes, creating a compound alignment constraint. The clearance between mating pins and holes is typically less than 1 mm, demanding a robotic placement accuracy within a fraction of this value. Furthermore, the components are located at separate workstations in a manufacturing cell, necessitating material handling between stations. A fixed-base robot would require an excessively large workspace or complex part conveyance systems. A mobile platform, such as an Automated Guided Vehicle (AGV), offers superior flexibility and scalability. However, this introduces a primary challenge: the positional uncertainty of the mobile base directly transfers to the manipulator’s global reference frame, degrading the overall system’s absolute positioning accuracy. Compensating for this base error is paramount for successful assembly. The system must therefore integrate robust object recognition, high-precision local relative positioning, and strategies to decouple the mobile base’s errors from the fine assembly motions.

3D Object Recognition Based on CAD Model Matching

To reliably locate and grasp the randomly posed components, a vision-based recognition system using a 3D matching approach is employed. The core methodology is based on matching a live camera image against a pre-generated set of synthetic views derived from the component’s Computer-Aided Design (CAD) model. This method is implemented using the HALCON machine vision library.

The process begins with the offline creation of a shape model. The 3D CAD model of the target component (e.g., the support plate) is loaded into the system. A virtual camera is then used to project this 3D model onto a 2D image plane from a vast number of different viewpoints. These viewpoints systematically sample the expected orientation space of the part. For each viewpoint $v_i$, defined by its position $(X_v, Y_v, Z_v)$ and orientation $(\alpha_v, \beta_v, \gamma_v)$ relative to the model, a synthetic template image $T_i$ is generated. Each template is associated with its precise 3D pose $P_i$. This set $\{T_i, P_i\}$ forms the recognition model. The relationship between a 3D model point $M$ and its 2D projection $m$ in a template for a given pose is given by the perspective projection equation:
$$ m = K \cdot [R | t] \cdot M $$
where $K$ is the camera intrinsic matrix, and $[R | t]$ is the extrinsic matrix representing the rotation and translation of the pose $P_i$.

During online operation, a grayscale image $I$ is captured by the physical camera mounted on the robot. The recognition algorithm searches for instances of the shape model within $I$ by comparing image regions against the stored templates $T_i$. A similarity measure $S(I_{region}, T_i)$ is computed, often based on normalized cross-correlation or edge-based features. The algorithm identifies the region and template with the highest similarity score, which yields not only the 2D location of the part in the image but also its corresponding 3D pose $P_{est}$ relative to the camera. Through hand-eye calibration, this camera-relative pose $P_{est}$ is transformed into a pose $P_{robot}$ in the robot’s coordinate frame, enabling precise grasping.

A significant environmental challenge is inconsistent lighting, which can cast strong shadows, altering edge structures and local contrast, thereby reducing recognition robustness and accuracy. To mitigate this, an adaptive thresholding algorithm is incorporated during image preprocessing. Instead of a single global threshold, a dynamic local threshold is computed. A smoothed version of the image $I_{smooth}$ is obtained using a large Gaussian filter. This smoothed image serves as a local brightness reference. The thresholded image $I_{bin}$ is then generated pixel-by-pixel:
$$ I_{bin}(x,y) = \begin{cases} 1 & \text{if } I(x,y) \geq I_{smooth}(x,y) – C \\ 0 & \text{otherwise} \end{cases} $$
where $C$ is a constant offset. This operation effectively subtracts the slowly varying background illumination, including shadows, and segments the object based on local contrast, significantly improving recognition rates under variable lighting.

Mobile Robotic System Design

The designed assembly robot is a synergistic integration of mobility and dexterous manipulation. The system architecture comprises two main subsystems: a mobile base for gross positioning and a multi-degree-of-freedom manipulator for precise local motion and assembly.

Mechanical and Structural Configuration

The system possesses a total of nine degrees of freedom. The mobile base, an AGV, provides three degrees of freedom in the plane (translation in X, Y, and rotation $\theta$). Mounted on the AGV is a six-axis articulated robotic arm, which provides the necessary dexterity for approaching, orienting, and inserting the component from various angles. The kinematic chain, from the world frame $\{W\}$ attached to the factory floor to the tool center point (TCP) frame $\{T\}$ on the robot gripper, can be described as:
$$ {^W}T_{T} = {^W}T_{B}(\boldsymbol{q}_{AGV}) \cdot {^B}T_{0} \cdot {^0}T_{T}(\boldsymbol{\theta}_{arm}) $$
where ${^W}T_{B}$ is the homogeneous transformation from world to base, dependent on the AGV’s pose $\boldsymbol{q}_{AGV}$; ${^B}T_{0}$ is the fixed mounting transformation; and ${^0}T_{T}$ is the forward kinematics of the arm dependent on its joint angles $\boldsymbol{\theta}_{arm}$. This decoupled structure allows separate control strategies for gross navigation and fine manipulation.

Control System Architecture

The control system is centered around an industrial PC (IPC) acting as the main supervisor. It handles high-level task planning, vision processing, and coordination between subsystems. The IPC communicates via wireless Ethernet with the AGV’s onboard controller, which executes low-level navigation, obstacle avoidance (using integrated Lidar), and docking. A separate, high-speed communication link (USB or Ethernet) connects the IPC to the controller of the 6-DOF collaborative robot arm (a UR3 model). A calibrated camera is mounted near the robot’s wrist (eye-in-hand configuration) or on a fixed bracket near the work area, streaming images directly to the IPC for processing. All subsystems are powered from a common onboard battery pack on the AGV, ensuring untethered operation.

Two-Stage Target Positioning Control Strategy

The key innovation to overcome AGV positioning error is a two-stage visual servoing strategy for both the pickup and assembly phases. The AGV’s typical docking accuracy may be on the order of ±10 mm, which is unacceptable for the sub-mm assembly task. A single visual recognition step from the AGV’s parked position might fail if the part is near the edge of the camera’s field of view, where lens distortion is higher, or if it is partially occluded.

The two-stage strategy is implemented as follows:

Stage 1 – Coarse Centering: After the AGV parks at the target workstation, the vision system performs an initial recognition to locate the part. The robot arm then moves the camera (or the arm itself) to position the recognized part’s centroid at the very center of the image. This action compensates for the majority of the AGV’s positional error and ensures the part is in an optimal, low-distortion region of the camera view for the next stage.
Stage 2 – Fine Recognition and Grasping: A second, more accurate recognition is performed with the part centered. The resulting pose estimate $P_{robot}$ is highly precise because measurement errors from distortion are minimized. The robot uses this refined pose to execute the grasp or the insertion motion with high confidence.

This strategy is applied twice during a complete cycle: first for picking up the support plate, and again when aligning it with the pin housing for assembly. While adding an extra motion and recognition step, it dramatically reduces the failure rate and improves final positioning accuracy without requiring prohibitively expensive high-precision AGV docking systems. The overall workflow is: Navigate to Pickup Station → Two-Stage Pickup → Navigate to Assembly Station → Two-Stage Assembly.

Experimental Validation and Performance Analysis

An extensive experimental campaign was conducted to evaluate the system’s recognition accuracy, speed, and overall assembly success rate.

Object Recognition Accuracy and Robustness

The support plate was placed in various positions and orientations within the workspace, including cases with partial occlusion to simulate a cluttered bin-picking scenario. The adaptive thresholding algorithm proved crucial for handling shadow artifacts. The system’s output is the estimated 3D position $(X, Y, Z)$ of the part’s centroid relative to the robot base. The Euclidean error between the estimated position and a ground-truth measurement (obtained via precise manual teaching) was calculated. Results from a sample set are shown below.

Trial	X (mm)	Y (mm)	Z (mm)	Position Error (mm)
1	77.84	-82.58	571.62	0.906
2	50.41	-67.85	564.64	0.187
3	76.10	-52.90	573.22	0.533
4	-0.79	37.06	574.27	0.816
5	77.21	-12.99	575.07	0.051
6	-21.77	-98.40	566.44	0.102

The data demonstrates a high level of precision, with a maximum observed error of 0.906 mm and an average positioning error of 0.4325 mm. This level of accuracy is sufficient for engaging components with clearance fits around 1 mm.

Recognition and Cycle Time Optimization

Recognition speed is critical for total cycle time. The original, unrestricted pose search across a full hemisphere was computationally expensive. By constraining the expected part orientation—limiting roll and pitch to ±20° while allowing unrestricted rotation about the vertical axis (yaw)—the number of potential template matches is drastically reduced. This constraint is realistic for parts resting stably on a surface. The effect on processing time is significant, as shown in the comparison between full (global) and constrained (local) pose search.

Trial	Global Search Time (s)	Constrained Search Time (s)	Trial	Global Search Time (s)	Constrained Search Time (s)
1	1.8628	0.8419	7	1.9491	0.7598
2	1.7184	0.7122	8	1.9041	0.8503
3	1.7529	0.6981	9	1.6507	0.6632
4	2.0642	0.7594	10	1.5314	0.6587
5	1.7003	0.6707	11	1.3867	0.5455
6	1.7113	0.6648	12	1.3078	0.5235

The constrained search reduced the average recognition time to approximately 0.695 seconds, representing a speed increase of nearly 59% while maintaining a high recognition success rate.

Complete Assembly Cycle Performance

The ultimate test is the complete assembly of the support plate into the pin housing of the rotary vector reducer. The two-stage positioning strategy was employed for both the pickup and assembly operations. The total cycle time, from AGV initiation at the pickup station to successful assembly and readiness for the next cycle, was measured and compared against a naive single-recognition approach.

Trial	Single-Stage Time (s)	Two-Stage Time (s)	Improvement	Time Saved (s)
1	134	94	29.8%	40
2	150	92	38.6%	58
3	131	95	27.5%	36
4	133	90	32.3%	43
5	135	97	28.1%	38
6	137	94	31.4%	43

The two-stage strategy yielded a stable, fast average cycle time of 94 seconds, well within the target of 90-100 seconds for practical deployment. While the single-stage approach occasionally achieved a slightly higher success rate (94% vs. 90%) due to fewer procedural steps, it was significantly slower and less consistent. The two-stage strategy provided an optimal balance, saving an average of 43 seconds per cycle while maintaining a robust 90% success rate, making it far more efficient for high-volume production of the rotary vector reducer.

Conclusion

This work successfully designed and demonstrated a mobile robotic system capable of performing the precise assembly task required for a critical sub-assembly of the rotary vector reducer. By integrating an AGV for mobility with a 6-DOF manipulator for dexterity and a sophisticated vision system for guidance, the system addresses the flexibility needs of modern manufacturing cells. The implementation of a CAD-based 3D recognition algorithm with adaptive thresholding ensures robust part localization under realistic lighting conditions. The introduced two-stage target positioning control strategy is the key enabler, effectively decoupling the coarse positioning errors of the mobile base from the fine manipulation task. This allows the use of a standard-precision AGV while achieving the sub-millimeter local accuracy needed for assembly.

Experimental results confirm the system’s capability, with an average part localization error of 0.43 mm and a complete assembly cycle time under 95 seconds. The system proves that automated, flexible assembly of complex components like those in the rotary vector reducer is feasible with current vision and robotic technology. Future work will focus on integrating force sensing for compliant insertion to handle even tighter tolerances, optimizing the path planning between stations to further reduce cycle time, and extending the recognition model library to allow the same system to assemble multiple variants of rotary vector reducer models, thereby enhancing its versatility and economic value in an automated production environment.