In modern agriculture, the application of robot technology has revolutionized harvesting processes, particularly through the development of picking robots. These systems require precise control to handle delicate fruits without damage, making compliant force control a critical aspect. We propose a method based on Generalized Policy Iteration (GPI) algorithm for torque feedback to enhance the compliant control of gripping forces in picking robots. This approach addresses nonlinear characteristics in trajectory planning and force distribution, leveraging advanced robot technology to improve adaptability and robustness. By integrating dynamic models and iterative learning, our method ensures stable and efficient harvesting operations.
The complexity of grasping trajectories in picking robots introduces significant nonlinearities, which traditional control methods struggle to manage. Existing approaches, such as fuzzy control or sliding mode control, often fail to provide sufficient precision under varying conditions. Our work focuses on optimizing torque feedback using GPI algorithms, which allows for real-time adjustments and minimizes errors in force application. This advancement in robot technology not only enhances picking efficiency but also reduces fruit damage, contributing to sustainable agricultural practices.
In this paper, we first establish a mechanical model for the picking robot, considering inertial forces, gravity, and gripping forces. We then design a torque feedback control model using a combination of feedback and inner-loop force controllers. The GPI algorithm is employed to optimize the parameters for flexible torque feedback. Subsequently, we develop a compliant control planning compensation algorithm based on inverse kinematics and motion task planning, incorporating model feed-forward compensation to improve torque flexibility. Experimental results demonstrate the superiority of our method in terms of torque error reduction and trajectory tracking accuracy.
The remainder of this paper is organized as follows: Section 2 details the modeling of the picking robot using D-H parameters. Section 3 describes the torque feedback control model and the GPI algorithm. Section 4 presents the controller design and stability analysis. Section 5 discusses the experimental setup and results, followed by a conclusion in Section 6.
Modeling of the Picking Robot
To accurately control the gripping force, we begin by modeling a 6-degree-of-freedom serial robot, which serves as our experimental platform. The robot consists of a dynamic platform and two branched chain structures, allowing for complex movements in three-dimensional space. We consider the combined effects of dynamic compensation inertial forces, gravitational forces, and gripping forces to build a comprehensive mechanical model. Using identified parameter values as inputs for the inertial torque control law, we establish a Denavit-Hartenberg (D-H) coordinate system. The base coordinate system is defined under static conditions, with the joint coordinate system set as $$^{base}_0T = \text{rot}(x, 180^\circ)$$. When the shoulder-wrist vector aligns with the fixed vector, we configure global parameters and set reference planes to parameterize the picking angles.
The homogeneous transformation matrix for the D-H parameter modeling is expressed as:
$$^{i-1}_iT = \begin{bmatrix}
\cos\theta_i & -\sin\theta_i \cos\alpha_i & \sin\theta_i \sin\alpha_i & a_i \cos\theta_i \\
\sin\theta_i & \cos\theta_i \cos\alpha_i & -\cos\theta_i \sin\alpha_i & a_i \sin\theta_i \\
0 & \sin\alpha_i & \cos\alpha_i & d_i \\
0 & 0 & 0 & 1
\end{bmatrix}$$
where $\theta_i$ is the twist angle of the $i$-th joint, $\alpha_i$ is the link angle, $d_i$ is the link offset, $^{i-1}_iR$ is the position vector of the $i$-th joint in the $(i-1)$-th coordinate system under the arm configuration angle, and $^{i-1}_ip$ is the position vector in the inverse kinematics model $^0_7T = (^{base}_0T)^{-1} \times ^{base}_7T$.
Based on D-H theory, we construct the end-effector pose matrix as:
$$^{base}_7T = ^{base}_0T \times \prod_{i=1}^{7} ^{i-1}_iT$$
Considering external disturbances, we build the end-effector pose matrix in the 0-th joint coordinate system using an iterative learning approach:
$$^0_7T = (^{base}_0T)^{-1} \times ^{base}_7T$$
Under the combined action of dynamic compensation inertial forces, gravitational forces, and gripping forces, we analyze the shoulder and wrist joint coordinates in the 0-th joint coordinate system:
$$^0p_s = ^{base}p_{bs} = [0, 0, -d_{bs}]^T$$
$$^0p_w = ^0p_7 – ^0R_7 \times ^7p_{wf} = ^0p_7 – ^0R_7 \times l_{wf}$$
To account for torque disturbances, we use the shoulder and wrist vectors in the 0-th joint coordinate system to compute the elbow joint coordinates:
$$^0p_e = ^0p_s \times ^0p_w$$
This establishes a minimal motion transformation model, forming the basis for the robot’s D-H parameter and motion models in robot technology applications.
Torque Feedback Control Model
Building on the mechanical model, we aim to design a torque feedback algorithm that combines a feedback controller and an inner-loop force controller. This integration enables the construction of a control-type end-effector, essential for precise force management in robot technology. The driving forces for each branch chain are derived as:
$$\tau_a = M(\eta) \times \ddot{\eta} + C(\eta, \dot{\eta}) + G(\eta)$$
where $M(\eta)$ represents the feedback torque model, $C(\eta, \dot{\eta})$ is the disturbance term, and $G(\eta)$ is the conventional term. In joint control, each joint’s driving force is supplied by motors and finely regulated through the torque feedback model.
Given the open-chain topology of the picking robot’s grasping behavior, we structure the GPI algorithm controller into input, prediction, optimization, and output layers. This facilitates torque prediction and optimized control. The torque feedback control model is formulated as:
$$x(k, t) = [x_1(k, t), x_2(k, t)]^T \times K \times U$$
where $x_1(k, t)$ and $x_2(k, t)$ are state input and control output vectors during the $t$-th iteration, $K$ is the number of iterations, and $U$ is the system control output.
This model monitors and adjusts torque variations in real-time, ensuring compliant and stable grasping actions. We further develop a combined control model that incorporates dynamic compensation inertial forces, gravitational forces, and gripping forces, enhancing the robustness of robot technology systems.
GPI Algorithm and Control Convergence Analysis
We employ the GPI algorithm to optimize parameters in the flexible torque feedback control process. Using inverse kinematics and motion task planning, we establish a compliant control planning compensation algorithm. Before designing the controller, we assume that $F(x(k, t), t)$, $B(x(k, t), t)$, and $E(x(k, t), t)$ satisfy the globally consistent Lipschitz condition under iterative learning for $t \in [0, T]$:
$$\begin{aligned}
\|F(x(k+1, t), t) – F(x(k, t), t)\| &\leq c_F \|x(k+1, t) – x(k, t)\| \\
\|B(x(k+1, t), t) – B(x(k, t), t)\| &\leq c_B \|x(k+1, t) – x(k, t)\| \\
\|E(x(k+1, t), t) – E(x(k, t), t)\| &\leq c_E \|x(k+1, t) – x(k, t)\|
\end{aligned}$$
where $c_F$, $c_B$, and $c_E$ are Lipschitz constants.
By solving the inverse kinematics for desired deviations and using force control with feed-forward flow compensation, we determine the desired torque for each joint. A GPI-based iterative learning controller is designed as:
$$u_{k+1}(t) = u_k(t) + K_{p,k} e_{k+1}(t) + K_{d,k} \dot{e}_{k+1}(t)$$
where $k$ is the iteration index, $e_{k+1}(t)$ is the tracking error in the $(k+1)$-th picking trajectory, and $K_{p,k}$ and $K_{d,k}$ are gain matrices for the GPI process.
Applying the Bellman-Gronwall theorem, we deduce that for $t \in [0, T]$, $u_d(t) – u_k(t)$, $x_d(t) – x_k(t)$, and $y_d(t) – y_k(t)$ are bounded and convergent. Taking norms on both sides over $[0, T]$ and introducing a sliding mode order to enhance robustness, the convergence error is:
$$S(t) = [y_d(t) – y_k(t)] \times (u_d(t) – u_k(t))^{-1}$$
As $k \to \infty$, with appropriate GPI coefficients, Eq. (11) converges boundedly; ignoring disturbances like mechanical vibration, the error converges to zero, confirming the bounded convergence of the GPI algorithm in robot technology.
Controller Design
We develop a compliant control planning compensation algorithm under torque feedback using inverse kinematics and motion task planning. The robot’s dynamics equation is constructed as:
$$\tau(k, t) = M(q(k, t)) \times \ddot{q}(k, t) + C(q(k, t), \dot{q}(k, t))$$
Optimizing for tracking accuracy, the trajectory tracking error is:
$$e(k, t) = q_d(t) – q(k, t) \times \tau(k, t)$$
Defining a sliding surface $s(k, t)$ with a positive coefficient $c_1$, the Lyapunov function is:
$$V_k(t) = \frac{1}{2} s(k, t)^2 + c_1 \times \xi$$
Incorporating model feed-forward compensation into the Lyapunov function, which represents the end-effector pose, and analyzing error convergence under iterative learning initial conditions:
$$\dot{V}_0 = s(k, t) \times M \times \dot{e}(k, t)$$
Since $V_0$ is bounded, $\lim_{k \to \infty} s(k, t) = 0$, ensuring controller stability. This design enhances the compliant control of gripping forces in robot technology.
Experimental Setup and Results Analysis
We built an experimental platform to validate our method. Torque sensors were installed on each joint of the picking robot, and force sensors were mounted at the drive end. The control system, based on a PTZ camera active design, used a spatial station platform as the main control room. The PTZ camera tracked picking targets, while a human-machine interface comprising smart headbands and tablets facilitated control via virtual cursors. The robot’s picking trajectory was displayed on a 23-inch screen with 1920×1080 resolution.

Key parameters include sensor accuracy of ±5%, response time <0.1 s, drive power of 500 W, speed range of 0–3000 rpm, position control accuracy of ±0.1 mm, and velocity control accuracy of ±1%. The controller, with a 2 GHz main frequency and 4 GB RAM, supports RS232, USB, and Ethernet. Initial pose parameters were $\alpha = 0^\circ$, $\beta = 0^\circ$, torque sampling time was 0.018 s, and position error threshold was 2 mm. The D-H parameters are summarized in Table 1.
| Link | Offset $d_i$ (m) | Angle $\alpha_i$ (°) | Length $a_i$ (m) | Angle $\theta_i$ (°) | Range (°) |
|---|---|---|---|---|---|
| 1 | 0.25 | 0 | 0 | $\theta_1$ | -150 to 150 |
| 2 | 0 | -90 | 0 | $\theta_2$ | -225 to 45 |
| 3 | 0 | 0 | 0.43 | $\theta_3$ | -45 to 225 |
| 4 | 0.25 | -90 | 0.02 | $\theta_4$ | -110 to 170 |
| 5 | 0 | 90 | 0 | $\theta_5$ | -100 to 100 |
| 6 | 0.15 | -90 | 0 | $\theta_6$ | -250 to 250 |
We compensated the D-H parameters and compared our GPI torque feedback method with fuzzy control (Literature [4]) and sliding mode control (Literature [5]). Torque errors for inertial, gravitational, and gripping forces were analyzed, showing that our method reduced average errors by 9.51%, 6.82%, and 4.46% across three branches, with disturbances eliminated within 0.01 s. Trajectory tracking simulations demonstrated that our approach decreased average, maximum, and standard deviations of errors by 23.2%, 48.4%, and 26.9%, respectively. In free space, position tracking accuracy improved by 24.12% and 42.86%, highlighting advancements in robot technology.
For a constant fruit mass of 5 kg, we analyzed node spacing distance and picking speed. Our method maintained higher node spacing (up to 1.46 m at 50 s) and variable speed (peak of 1.63 m/s at 40 s), outperforming other methods in efficiency and control, as shown in Table 2.
| Method | Torque Error Reduction (%) | Position Tracking Improvement (%) | Max Picking Speed (m/s) |
|---|---|---|---|
| GPI Torque Feedback | 9.51, 6.82, 4.46 | 24.12, 42.86 | 1.63 |
| Fuzzy Control | Lower | Lower | 1.18 |
| Sliding Mode Control | Lower | Lower | 1.24 |
Conclusion
In this study, we presented a compliant control method for picking robots using GPI-based torque feedback. By modeling the robot’s dynamics and designing a torque feedback control system, we optimized parameters with the GPI algorithm and incorporated feed-forward compensation to enhance flexibility. Experimental results confirmed significant reductions in torque errors and improvements in trajectory tracking, demonstrating the effectiveness of our approach in robot technology. Future work will focus on refining the GPI algorithm for higher precision and conducting extensive real-world tests to minimize fruit damage during harvesting, further advancing the field of agricultural robotics.
