The pursuit of versatile robotic platforms capable of operating seamlessly across the land-water boundary is a compelling frontier in robotics research. Inspired by the adaptability of crustaceans and amphibians, the development of an amphibious hexapod bionic robot presents a unique set of challenges and opportunities. This bionic robot must not only navigate the complexities of unpredictable, rugged terrestrial terrain but also achieve precise and agile locomotion in the dynamic underwater environment. The design of such a bionic robot, featuring articulated limbs that can function both as walking legs and as multi-vectored thrusters, inherently leads to a high-dimensional control problem. This article presents a comprehensive study on the motion control strategies developed for such a platform, decomposing the overarching challenge into distinct terrestrial and aquatic domains, each addressed with tailored methodologies.

The terrestrial locomotion challenge for a legged bionic robot centers on generating stable, efficient, and adaptive gaits. Traditional model-based or Central Pattern Generator (CPG) approaches, while effective for rhythmic motions on known or gently sloped terrain, often struggle with generalization to highly irregular and unseen landscapes. The need for a control policy that can dynamically adjust foot placement and body posture in response to immediate terrain feedback is paramount. To this end, we turned to Deep Reinforcement Learning (DRL), a paradigm where an agent learns optimal behaviors through trial-and-error interactions with a simulated environment. This data-driven approach is particularly well-suited for the bionic robot’s complex action space, defined by the numerous joints in its six limbs.
We constructed interactive training environments using the MuJoCo physics engine, accurately modeling the bionic robot’s multi-jointed limbs and their interactions with the ground. Terrain of varying ruggedness was generated using height maps, categorized into three primary types for training and evaluation: flat, mildly rugged, and severely rugged. The core of the learning framework was the Proximal Policy Optimization (PPO) algorithm, chosen for its sample efficiency and stability in handling continuous control tasks. The agent’s state observation $s_t$ was designed to capture information critical for stability and progress:
$$ s_t = [p_{CoG}, \theta_{joints,1…18}, c_{foot,1…6}]^T $$
where $p_{CoG}$ is the robot’s body-centered position and orientation, $\theta_{joints}$ are the current angles of all 18 limb joints, and $c_{foot}$ are boolean signals indicating foot-ground contact. The action $a_t$ output by the policy network directly commanded the angular velocities for all 18 joints. The reward function $r_t$ was carefully shaped to encourage forward propulsion while maintaining a stable, efficient gait:
$$ r_t = (x_t – x_{t-1}) + 0.001(\gamma_1 C_1 + \gamma_2 C_2 + \gamma_3 C_3) $$
subject to $\gamma_1 + \gamma_2 + \gamma_3 = 1, \gamma_i \in \{0,1\}$.
The first term rewards forward displacement along the body’s x-axis. The second term provides auxiliary rewards or penalties based on the number of legs lifted off the ground, encouraging the bionic robot to discover stable tripod gaits ($C_2$, high reward) while penalizing unstable states with more than three legs lifted ($C_3$, negative reward).
The Actor-Critic networks within PPO were trained over numerous episodes. The learned policies demonstrated remarkable adaptability. The bionic robot successfully acquired distinct locomotion strategies $\pi_{flat}$, $\pi_{0.5H}$, $\pi_{1H}$, $\pi_{1.5H}$ optimized for terrains of increasing ruggedness (where H is the nominal standing height). As shown in the simulation results, the bionic robot controlled by these policies could traverse its respective target terrain rapidly and stably. A key finding was the downward compatibility of policies: strategies learned on more rugged terrain ($\pi_{1H}$, $\pi_{1.5H}$) performed competently, though not optimally, on flatter ground, indicating a degree of generalization in the learned behavior. The following table summarizes the average distance traveled in a fixed number of simulation steps under different policy-terrain combinations, clearly showing the specialization of each policy.
| Terrain / Policy | $\pi_{flat}$ | $\pi_{0.5H}$ | $\pi_{1H}$ | $\pi_{1.5H}$ |
|---|---|---|---|---|
| Flat | 6.75 m | 6.48 m | 6.07 m | 6.26 m |
| 0.5H Rugged | 5.79 m | 6.36 m | 5.98 m | 5.61 m |
| 1H Rugged | 5.09 m | 5.16 m | 5.77 m | 5.39 m |
| 1.5H Rugged | 4.82 m | 4.96 m | 4.92 m | 5.13 m |
Transitioning to the aquatic domain, the control problem for the bionic robot shifts fundamentally. The limbs are reconfigured into multi-vectored thrusters, and the dynamics are governed by hydrodynamics rather than contact mechanics. The overarching goal is to achieve controlled motion in three-dimensional underwater space. We approached this by first deriving a simplified yet representative dynamic model for the bionic robot. Following standard marine vessel modeling theory and assuming near-neutral buoyancy, the equations of motion in the body-fixed frame can be expressed as:
$$ \mathbf{M} \dot{\boldsymbol{\nu}} + \mathbf{C}(\boldsymbol{\nu})\boldsymbol{\nu} + \mathbf{D}(\boldsymbol{\nu})\boldsymbol{\nu} = \boldsymbol{\tau} $$
$$ \dot{\boldsymbol{\eta}} = \mathbf{J}(\boldsymbol{\eta}) \boldsymbol{\nu} $$
Here, $\boldsymbol{\nu} = [u, v, w, r]^T$ represents the body-frame velocities (surge, sway, heave, and yaw rate). $\boldsymbol{\eta} = [x, y, z, \psi]^T$ is the pose vector in the inertial frame. $\mathbf{J}$ is the transformation matrix. $\boldsymbol{\tau}$ is the vector of generalized forces from the thrusters. The matrices $\mathbf{M}$ (inertia including added mass), $\mathbf{C}$ (Coriolis and centripetal), and $\mathbf{D}$ (damping) exhibit a decoupled structure due to the bionic robot’s symmetrical design:
$$
\mathbf{M} = \begin{bmatrix}
m-X_{\dot{u}} & 0 & 0 & 0\\
0 & m-Y_{\dot{v}} & 0 & 0\\
0 & 0 & m-Z_{\dot{w}} & 0\\
0 & 0 & 0 & I_z – N_{\dot{r}}
\end{bmatrix}, \quad
\mathbf{D}(\boldsymbol{\nu}) = \begin{bmatrix}
X_u+X_{|u|u}|u| & 0 & 0 & 0\\
0 & Y_v+Y_{|v|v}|v| & 0 & 0\\
0 & 0 & Z_w+Z_{|w|w}|w| & 0\\
0 & 0 & 0 & N_r+N_{|r|r}|r|
\end{bmatrix}
$$
Analysis of this model reveals a natural decoupling: the heave motion ($w$) and yaw motion ($r$) are dynamically independent from the other degrees of freedom, while motions in the horizontal plane (surge $u$, sway $v$) are coupled with yaw. This insight allowed us to decompose the 3D motion control task into two more manageable sub-problems: planar trajectory tracking and depth (heave) control.
For planar trajectory tracking, we employed a guidance-control hierarchy. The guidance law is based on the Line-of-Sight (LOS) principle. Given a desired path defined by a series of waypoints $P_k = (x_k, y_k)$, the LOS algorithm calculates a desired heading angle $\psi_{LOS}$ for the bionic robot to intercept the next waypoint:
$$ \psi_{LOS} = \text{atan2}(y_k – y, x_k – x) $$
where $(x, y)$ is the bionic robot’s current position. A Proportional-Integral-Derivative (PID) controller then regulates the yaw rate $r$ to drive the heading error $\psi_e = \psi_{LOS} – \psi$ to zero. Simultaneously, a separate PID controller manages the surge thrust for forward speed. When the bionic robot enters a circle of acceptance with radius $R$ around the current waypoint, the target waypoint is updated. This combination allows the bionic robot to smoothly follow curved paths. For depth control, a straightforward but effective PID controller was implemented. Using feedback from a pressure sensor, the controller adjusts the vertical thrust output to minimize the error between the desired depth $z_d$ and the measured depth $z$.
The performance of these underwater control strategies was validated through pool experiments with a physical prototype of the bionic robot. The bionic robot was tasked with tracking a sigmoidal curve in the horizontal plane. The experimental results, compared against a simulation using the identified hydrodynamic parameters, demonstrated effective tracking. The table below shows key error metrics at selected points, confirming the bionic robot’s ability to follow the path with a tracking error generally within the acceptance radius $R$ (0.13 m).
| Point | Desired Position | Experimental Tracking Error (e) | Experimental Heading Error ($\psi_e$) |
|---|---|---|---|
| A | (1.00, 0.13) m | 0.030 m | 4.2° |
| B | (1.50, 0.89) m | 0.110 m | 3.4° |
| C | (1.80, 1.22) m | 0.060 m | 3.7° |
In the depth control experiment, the bionic robot was commanded to dive to 0.5 m, hold position, and then surface to 0.1 m. The depth profile over time showed a stable, linear descent and ascent phases with an average velocity of approximately 0.06 m/s during diving. The steady-state error during the holding phase was maintained within ±0.02 m, demonstrating the precision of the PID controller for this application. The bionic robot successfully executed the entire maneuver, showcasing its practical utility for subaquatic missions requiring depth stability.
In conclusion, this work has presented and validated a dual-strategy approach for controlling an amphibious hexapod bionic robot. For terrestrial locomotion across challenging, rugged terrain, a Deep Reinforcement Learning framework based on the PPO algorithm proved highly effective. The learned policies enabled the bionic robot to generate adaptive, stable gaits without explicit terrain modeling. For underwater locomotion, a model-based analysis led to a decoupled control strategy utilizing LOS guidance for planar path following and PID control for depth regulation. Experimental results from the physical bionic robot prototype confirmed the feasibility and performance of these methods. The integration of these terrestrial and aquatic control paradigms forms a solid foundation for the development of truly autonomous and adaptable amphibious bionic robots capable of tackling complex multi-environment missions. Future work will focus on the hardware implementation and real-world transfer of the learned terrestrial policies, as well as the integration of environmental perception for fully autonomous navigation in both domains.
