Hierarchical Motion Control of Quadruped Robot Based on Spiking Reinforcement Learning and CPG

With the rapid advancement of robotics, quadruped robots have become increasingly prevalent in applications such as rescue operations, military missions, and exploration activities. These robot dogs are designed to navigate complex terrains and perform tasks autonomously. However, a significant challenge in the deployment of quadruped robots is the high energy consumption associated with their control systems, which can limit their operational duration and efficiency. Traditional control methods, such as model predictive control (MPC) and oscillator-based gait generation, often require precise modeling of the robot dog and struggle to adapt to dynamic environments. In recent years, artificial intelligence techniques, particularly neural networks, have been integrated into the motion control of quadruped robots to enhance adaptability. Despite the success of deep neural networks (DNNs), their high computational demands and energy consumption pose limitations for real-time applications in bio-inspired robotics.

To address these issues, we propose a hierarchical control algorithm that combines spiking reinforcement learning (SRL) and central pattern generators (CPG) for the motion control of quadruped robots. This approach, termed SRL-CPG, leverages the low energy consumption of spiking neural networks (SNNs) and the rhythmic pattern generation capabilities of CPG to achieve efficient and adaptive control. The SRL component, based on SNNs, serves as the high-level controller, processing state information and adjusting CPG parameters, while the CPG acts as the low-level controller, generating stable gait patterns for the robot dog. This hierarchical structure simplifies the action space for SRL and enhances the robot’s ability to handle complex tasks with reduced energy consumption.

The core of our method lies in the use of SNNs, which are considered the third generation of neural networks. Unlike traditional artificial neurons, spiking neurons operate in an event-driven manner, firing only when necessary, which significantly reduces energy usage. The leaky integrate-and-fire (LIF) model is employed to simulate the dynamics of these neurons, as described by the following equations:

$$ H_t = f(V_{t-1}, X_t), $$

$$ S_t = \Theta(H_t – V_{th}), $$

$$ V_t = H_t(1 – S_t) + V_{reset} S_t, $$

where $ H_t $ and $ V_t $ represent the membrane potential before and after spike generation at time step $ t $, $ X_t $ is the external input, $ S_t $ is the output spike, $ V_{th} $ is the threshold voltage, and $ V_{reset} $ is the reset voltage. The function $ f(\cdot) $ for the LIF neuron is defined as:

$$ f(V_{t-1}, X_t) = \alpha_V (V_{t-1} – V_{reset}) + V_{reset} + X_t, $$

where $ \alpha_V $ is the voltage decay factor. This model enables efficient computation and low power consumption, making it ideal for embedded systems in quadruped robots.

For the CPG component, we utilize the Hopf oscillator model due to its mathematical simplicity and ease of parameter adjustment. The dynamics of a single Hopf oscillator are given by:

$$ \dot{x} = \alpha (\mu – r^2) x – \omega y, $$

$$ \dot{y} = \alpha (\mu – r^2) y + \omega x, $$

where $ x $ and $ y $ are the state variables, $ \alpha $ is a constant determining the convergence speed, $ \mu $ is the bifurcation parameter controlling the amplitude, and $ \omega $ is the oscillation frequency. The output amplitude is $ A = \sqrt{\mu} $. In our implementation, the CPG network consists of four coupled Hopf oscillators, each corresponding to one leg of the quadruped robot. The coupling between oscillators is described by:

$$ \begin{bmatrix} \dot{x}_i \\ \dot{y}_i \end{bmatrix} = \begin{bmatrix} a(\mu – r_i^2) & -\omega_i \\ \omega_i & a(\mu – r_i^2) \end{bmatrix} \begin{bmatrix} x_i \\ y_i \end{bmatrix} + \sum_{j=1}^{4} R(\theta_{ij}) \begin{bmatrix} x_i \\ y_i \end{bmatrix}, $$

where $ R(\theta_{ij}) $ is the coupling matrix. The joint angles for the hip and knee are derived from the oscillator outputs as follows:

$$ \theta_{hi} = x_i, $$

$$ \theta_{ki} = \begin{cases} – \text{sgn}(\psi) \frac{A_k}{A_h} y, & y < 0, \\ 0, & y \geq 0, \end{cases} $$

where $ \text{sgn}(\psi) $ is a sign function dependent on the phase relationship. This configuration allows the CPG to generate rhythmic patterns that facilitate stable locomotion in the robot dog.

The SRL component is implemented using a population coding scheme to encode state information into spike trains. For an $ N $-dimensional state vector $ S $, each dimension is encoded by a population of neurons with Gaussian receptive fields. The stimulation strength $ A_E $ for each neuron is computed as:

$$ A_E = e^{-\frac{1}{2} \left( \frac{s_i – \mu}{\sigma} \right)^2}, $$

where $ \mu $ and $ \sigma $ are trainable parameters. The spike generation is deterministic, with gradients propagated based on whether a spike is emitted. The SNN consists of fully connected layers followed by LIF neurons, and the output is decoded into continuous actions by averaging spike rates over time and applying weighted sums. This design enables the SRL to adjust CPG parameters such as $ \mu $ and $ \omega $ based on environmental feedback, thereby controlling the locomotion of the quadruped robot.

To validate our approach, we conducted simulations in the Webots environment using a quadruped robot model named Gbot. The robot dog has a body mass of 6.5 kg and four legs, each with two joints (hip and knee). The simulation parameters are summarized in Table 1.

Parameter	Value
Body Mass ($ M_{body} $)	6.5 kg
Link 1 Mass ($ m_{l1} $)	0.45 kg
Link 2 Mass ($ m_{l2} $)	0.35 kg
Body Length ($ L_{body} $)	0.27 m
Body Width ($ W_{body} $)	0.24 m
Body Height ($ H_{body} $)	0.09 m
Link 1 Length ($ L_1 $)	0.25 m
Link 2 Length ($ L_2 $)	0.20 m

The first experiment focused on straight-line walking tasks in terrains of varying complexity, labeled as ENV1, ENV2, and ENV3. The reward function for straight-line walking was designed as:

$$ r_t = \xi_{\text{dist}} d_x – \xi_{\text{dev}} d_y – \xi_{\gamma} |\gamma| – \xi_j s_{\text{joint}}, $$

where $ d_x = x_{t+1} – x_t $ represents the displacement in the forward direction, $ d_y = |y_{t+1}| – |y_t| $ is the lateral deviation, $ \gamma $ is the yaw angle, and $ s_{\text{joint}} $ penalizes abnormal joint movements. The coefficients were set to $ \xi_{\text{dist}} = 1 $, $ \xi_{\text{dev}} = 0.3 $, $ \xi_{\gamma} = 0.3 $, and $ \xi_j = 0.001 $. We compared the performance of SRL-CPG against standalone SRL and CPG controllers. The results, including trajectory, yaw angle, and pitch angle variance, are shown in Table 2 and Table 3.

Environment	SRL	CPG	SRL-CPG
ENV1	-0.03 m	-0.21 m	0.01 m
ENV2	0.10 m	6.59 m	0.03 m
ENV3	0.18 m	6.91 m	0.05 m

Table 2: Lateral deviation distance for the quadruped robot in straight-line walking tasks.

Environment	SRL	CPG	SRL-CPG
ENV1	0.30	1.30	0.16
ENV2	0.15	4.73	0.14
ENV3	0.98	4.58	0.32

Table 3: Variance of pitch angle for the quadruped robot, indicating stability.

The results demonstrate that SRL-CPG achieves lower lateral deviation and better stability compared to SRL and CPG alone, especially in complex environments. The yaw angle control also showed superior performance, with SRL-CPG maintaining the desired direction more effectively.

The second experiment involved a velocity control task, where the quadruped robot was required to achieve target velocities in the X and Y directions ($ v_x = 0.25 \, \text{m/s} $, $ v_y = 0.25 \, \text{m/s} $) and a target yaw angle of $ 45^\circ $. The reward function was defined as:

$$ r_t = \xi_{x,y} f(v_{x,y}) + \xi_{\omega_z} f(\omega_z) – \xi_j s_{\text{joint}}, $$

with $ f(x) = \exp\left( -\frac{\| x_{\text{targ}} – x \|^2}{0.25} \right) $. The coefficients were $ \xi_{x,y} = 1.0 $, $ \xi_{\omega_z} = 0.5 $, and $ \xi_j = 0.001 $. We compared SRL-CPG with deep reinforcement learning combined with CPG (DRL-CPG) and MPC. The percentage errors for yaw angle and velocities are presented in Table 4.

Error Type	SRL-CPG	DRL-CPG	MPC
Yaw Error	1.22%	1.89%	9.71%
$ v_x $ Error	1.34%	6.48%	0.21%
$ v_y $ Error	5.48%	1.32%	12.72%

Table 4: Control errors in velocity control task for the quadruped robot.

SRL-CPG achieved the lowest overall errors, particularly in yaw and velocity control, showcasing its adaptability and precision in complex tasks.

Energy consumption is a critical factor for the deployment of quadruped robots. We evaluated the energy efficiency of SRL-CPG compared to DRL-CPG by calculating the energy per operation based on 45 nm chip technology, where the energy for a multiply-accumulate (MAC) operation is $ E_{\text{MAC}} = 4.6 \, \text{pJ} $ and for an accumulate (AC) operation is $ E_{\text{AC}} = 0.9 \, \text{pJ} $. The total energy consumption for each model is given by:

$$ E_{\text{model}} = E_{\text{MAC}} \cdot \text{FL}_{\text{Other}} + E_{\text{AC}} \cdot \text{FL}_{\text{SNN}}, $$

where $ \text{FL}_{\text{Other}} $ and $ \text{FL}_{\text{SNN}} $ represent the number of floating-point operations in other modules and SNN modules, respectively. The results for different spike firing cycle steps $ T $ are shown in Table 5.

Cycle Step	DRL-CPG Energy (nJ)	SRL-CPG Energy (nJ)	Energy Saving (%)
$ T = 1 $	31.42	4.84	84.60
$ T = 2 $	31.42	9.41	60.01
$ T = 3 $	31.42	19.76	37.12

Table 5: Energy consumption comparison for quadruped robot control.

SRL-CPG consistently reduces energy consumption, with savings up to 84.6% for $ T = 1 $, highlighting the advantage of using SNNs in energy-constrained applications for robot dogs.

In conclusion, our proposed SRL-CPG hierarchical control algorithm effectively addresses the challenges of energy consumption and control performance in quadruped robots. By integrating spiking reinforcement learning with central pattern generators, we achieve adaptive and stable locomotion for the robot dog while significantly reducing power usage. The simulation results confirm that SRL-CPG outperforms traditional methods in straight-line walking and velocity control tasks, making it a promising solution for real-world applications of quadruped robots. Future work will focus on extending this approach to more dynamic motions, such as jumping and flipping, and implementing it on physical robot dog platforms.