In recent years, the development of quadruped robots has gained significant attention due to their potential applications in various fields such as search and rescue, exploration, and logistics. Among these, hydraulic-driven quadruped robots, often referred to as robot dogs, offer advantages like high power density and robust load-bearing capabilities. However, controlling these systems is challenging due to the complex, high-dimensional coupling between hydraulic control parameters and body motion dynamics. Traditional control methods, which rely on precise modeling and parameter tuning, often struggle with adaptability in unstructured environments. To address this, we explore the use of deep reinforcement learning (DRL) for optimizing the jumping motion of a hydraulic quadruped robot’s single leg. By combining trajectory planning with DRL, we aim to achieve efficient, stable, and adaptive control under varying conditions.
This paper presents a comprehensive approach to controlling a hydraulic quadruped robot leg using a DRL framework integrated with fifth-order polynomial trajectory planning. We begin by modeling the single leg of a Spurios hydraulic quadruped robot in MATLAB/Simulink, followed by kinematic analysis and the design of a trajectory planner. The DRL controller, based on the Twin Delayed Deep Deterministic Policy Gradient (TD3) algorithm, is then developed to optimize control parameters for different jumping tasks. Simulations demonstrate that our method enables the robot dog to perform adaptive jumping motions with high stability and efficiency, even in complex environments. Throughout this work, we emphasize the application of DRL to enhance the performance of quadruped robots, making them more autonomous and resilient.

The hydraulic quadruped robot leg consists of two revolute joints: the hip joint and the knee joint, each actuated by a linear hydraulic cylinder. The leg structure is modeled using the Denavit-Hartenberg (D-H) coordinate system to facilitate kinematic analysis. Let us define the coordinate frames as follows: $O_0$, $O_1$, and $O_2$ represent the base, hip, and knee frames, respectively. The joint angles are denoted as $q_1$ for the hip flexion-extension and $q_2$ for the knee flexion-extension. The link lengths and twists are derived from the robot’s physical parameters, as summarized in the table below.
| Parameter | Value (mm) |
|---|---|
| Thigh Length | 350.0 |
| Shank Length | 350.0 |
| Hydraulic Cylinder Stroke | 201.0 – 261.0 |
The forward kinematics of the leg can be expressed using the homogeneous transformation matrices. The position of the foot-end relative to the base frame is given by:
$$ x = L_1 \sin(q_1) + L_2 \sin(q_1 + q_2) $$
$$ z = -L_1 \cos(q_1) – L_2 \cos(q_1 + q_2) $$
where $L_1$ and $L_2$ are the lengths of the thigh and shank, respectively. The inverse kinematics, which compute the joint angles from the foot-end position $(x, z)$, are derived as:
$$ q_1 = -\arctan\left(\frac{x}{z}\right) – \arccos\left(\frac{L_1^2 – L_2^2 + x^2 + z^2}{2L_1 \sqrt{x^2 + z^2}}\right) $$
$$ q_2 = \pi – \arccos\left(\frac{L_1^2 + L_2^2 – x^2 – z^2}{2L_1 L_2}\right) $$
To map the joint angles to hydraulic actuator displacements, we consider the geometric relationships in the leg mechanism. For instance, the hip joint angle $q_1$ corresponds to a hydraulic cylinder displacement range that ensures the leg operates within its mechanical limits. This mapping is crucial for converting the DRL controller’s outputs into actionable commands for the hydraulic system.
The trajectory planning module uses fifth-order polynomials to generate smooth foot-end trajectories during the jumping motion. This approach minimizes impact forces at touchdown and liftoff, ensuring stable motion. The trajectory is divided into two phases: the swing phase and the stance phase. For the swing phase, the foot-end position in the horizontal (X) and vertical (Z) directions is parameterized as follows:
$$ x_{sw}(t) = (6t_{sw}^5 – 15t_{sw}^4 + 10t_{sw}^3)(x_f – x_0) + x_0 $$
$$ z_{sw}(t) = \begin{cases}
[6(2t_{sw})^5 – 15(2t_{sw})^4 + 10(2t_{sw})^3](z_p – z_0) + z_0, & t_{sw} < 0.5 \\
[6(2t_{sw} – 1)^5 – 15(2t_{sw} – 1)^4 + 10(2t_{sw} – 1)^3](z_f – z_p) + z_p, & t_{sw} \geq 0.5
\end{cases} $$
where $t_{sw} = t / T_{sw}$ is the normalized time in the swing phase, $T_{sw}$ is the duration of the swing phase, $(x_0, z_0)$ is the initial foot position, $(x_f, z_f)$ is the target foot position, and $z_p$ is the peak height during the jump. For the stance phase, the trajectory is simpler, as the foot remains in contact with the ground:
$$ x_{st}(t) = (6t_{st}^5 – 15t_{st}^4 + 10t_{st}^3)(x_0 – x_f) + x_f $$
$$ z_{st}(t) = z_f $$
where $t_{st} = t / T_{st}$ and $T_{st}$ is the stance phase duration. This trajectory planning ensures that the quadruped robot leg moves smoothly, reducing oscillations and energy consumption.
The deep reinforcement learning controller is based on the TD3 algorithm, which addresses the overestimation bias common in value-based methods. The state space $S$ includes information about the leg’s position and velocity, as well as the hydraulic actuators’ states:
$$ S = [x, z, v_x, v_z, v_{a1}, v_{a2}] $$
where $x$ and $z$ are the foot-end positions in the horizontal and vertical directions, $v_x$ and $v_z$ are the corresponding velocities, and $v_{a1}$ and $v_{a2}$ are the velocities of the hip and knee hydraulic actuators. The action space $A$ consists of parameters that define the trajectory for each jump:
$$ A = [x_{end}, z_{end}, v_{end1}, v_{end2}] $$
Here, $x_{end}$ and $z_{end}$ are the target foot positions at the end of the jump, and $v_{end1}$ and $v_{end2}$ are the target velocities. These actions are used by the trajectory planner to generate reference signals for the hydraulic system.
The reward function $R$ is designed to encourage efficient and stable jumping. It consists of three components: a reward for vertical displacement $R_1$, a reward for horizontal displacement $R_2$, and a penalty for hydraulic flow consumption $R_3$:
$$ R_1 = k_1 (z_t – z_{t-1}) $$
$$ R_2 = k_2 (x_t – x_{t-1}) $$
$$ R_3 = k_3 (q_1 + q_2) $$
where $k_1 = 10$, $k_2 = 10$, and $k_3 = -0.5$ are scaling factors, and $q_1$ and $q_2$ are the flow rates of the hip and knee actuators. The total reward is:
$$ R = R_1 + R_2 + R_3 $$
This reward structure motivates the robot dog to achieve the desired jump height and distance while minimizing energy use. Additionally, termination conditions are set to reset the simulation if the leg exceeds safe operating limits, such as jumping too high or too far:
$$ \text{reset} = \text{true} \quad \text{if} \quad x > x_{target} + 0.05 \quad \text{or} \quad z > z_{target} + 0.05 $$
The TD3 algorithm employs an Actor-Critic architecture with two Critic networks to estimate the action-value function. The Actor network maps states to actions, while the Critic networks evaluate the actions based on the current state. The training parameters are summarized in the table below.
| Parameter | Value |
|---|---|
| Discount Factor | 0.99 |
| Soft Update Factor | 0.01 |
| Critic Learning Rate | 0.001 |
| Actor Learning Rate | 0.001 |
| Mini-batch Size | 256 |
| Target Policy Noise Variance | 0.2 |
| Training Episodes | 400 |
We conducted simulations in MATLAB/Simulink to validate our approach. The hydraulic quadruped robot leg model was built using Simscape components, and the contact with the ground was modeled with a spatial contact force module. The training involved three sets of target jump heights and distances: (1) height 0.5 m, distance 1.0 m; (2) height 0.6 m, distance 1.2 m; and (3) height 0.7 m, distance 1.4 m. For each set, the DRL controller was trained over 400 episodes, and the reward values were recorded. The results showed that the reward increased steadily and converged after about 250 steps, indicating effective learning. The trained leg achieved smooth and accurate jumps, closely matching the target trajectories.
To test adaptability in complex environments, we introduced obstacles of varying heights and distances: 0.35 m, 0.45 m, and 0.6 m tall, placed at 0.4 m, 1.6 m, and 3.2 m horizontally from the starting point. The reward function was modified to $R_1^* = k_1^* z_t$ with $k_1^* = -2$ to encourage jumping at the lowest possible height to clear obstacles. After training, the quadruped robot leg successfully navigated these obstacles with minimal height, demonstrating strong environmental adaptability and motion stability.
In conclusion, our deep reinforcement learning framework, combined with fifth-order polynomial trajectory planning, effectively optimizes the jumping control of a hydraulic quadruped robot leg. The method enables the robot dog to learn adaptive strategies for different tasks, such as fixed-distance jumps and obstacle clearance, while maintaining smooth and stable motion. Future work could extend this approach to full-body control of quadruped robots, incorporating more complex terrains and dynamic interactions. This research highlights the potential of DRL in advancing the autonomy and performance of hydraulic-driven robot dogs, making them more capable in real-world applications.
The integration of trajectory planning with DRL not only improves learning efficiency but also ensures that the generated motions are physically feasible and energy-efficient. For instance, the use of fifth-order polynomials guarantees continuous acceleration, reducing mechanical stress on the hydraulic components. Moreover, the TD3 algorithm’s robustness to noise and overestimation makes it suitable for high-dimensional control problems in quadruped robots. As the field of robotics evolves, such hybrid approaches will be crucial for developing intelligent systems that can operate autonomously in unpredictable environments.
In summary, the key contributions of this work include the development of a kinematic model for a hydraulic quadruped robot leg, the design of a trajectory planner using fifth-order polynomials, and the implementation of a DRL-based controller for optimized jumping. The simulations confirm that our method achieves high performance and adaptability, paving the way for more advanced applications of robot dogs in various industries. By leveraging deep reinforcement learning, we can overcome the limitations of traditional control methods and unlock new possibilities for quadruped robot locomotion.
