Humanoid Robot Locomotion in Unstructured Environments: A Comprehensive Review of Planning and Control

The quest to create machines that mirror human form and capability has long been a central pursuit in robotics and artificial intelligence. Among these, the humanoid robot stands as a particularly ambitious goal, promising a future where machines can operate seamlessly in human-centric environments. Unlike their wheeled or tracked counterparts, a humanoid robot possesses the unique potential to navigate narrow passages, climb stairs, traverse obstacles, and manipulate human tools, making it an ideal candidate for applications in disaster response, healthcare, domestic assistance, and advanced manufacturing.

However, transitioning this potential into reliable, real-world functionality hinges on solving the formidable challenge of stable locomotion in unstructured environments. Such environments are characterized by complex, uneven terrain featuring slopes, steps, and scattered debris; frequent and unpredictable external disturbances like pushes or payload shifts; and inherent uncertainties in perception, state estimation, and actuation. For a humanoid robot, achieving robust walking in these conditions requires the seamless integration of several sub-tasks: perception to understand the environment, state estimation for localization, planning to devise a feasible motion strategy, and control to execute it faithfully. While perception provides the map, it is the tight coupling of planning and control that translates intention into stable, dynamic movement, enabling a humanoid robot to evolve from merely “able to walk” to walking “steadily, swiftly, and intelligently” in complex settings.

Research in humanoid robot locomotion has evolved through distinct phases. The early stage focused on basic balance and periodic gait generation using simplified dynamics. The developmental stage introduced optimization and predictive control to tackle non-flat terrain walking and perturbation recovery. Currently, we are in an emerging stage characterized by the integration of deep learning and reinforcement learning, pushing towards end-to-end or tightly coupled frameworks that blend perception, planning, and control. Internationally, platforms like Atlas, HRP, and iCub have driven progress, while domestic development has accelerated with platforms from companies like Unitree and UBTECH. This article reviews the recent advances in planning and control for humanoid robot walking in unstructured environments, systematically analyzing core methods, comparing their strengths and limitations, and discussing future trends and bottlenecks.

1. Locomotion Planning for Humanoid Robots

Planning for a humanoid robot serves as the critical bridge between high-level task commands and low-level motor execution. The core challenge extends beyond finding a geometric path; it must incorporate the robot’s dynamics and stability constraints to produce a physically realizable motion. Modern planning frameworks are inherently hierarchical, decomposing the problem into global footstep planning, which decides where to step, and motion trajectory planning, which determines how to move the body and limbs between those steps.

1.1 Global and Local Footstep Planning

The first step in navigating a complex world is determining a sequence of discrete foot placements, or footsteps, that lead from start to goal while ensuring stability and avoiding obstacles.

1.1.1 Geometry-Based Planning. Early methods adapted classical algorithms from mobile robotics. The A* algorithm searches a discretized grid for a minimum-cost path, while sampling-based planners like Rapidly-exploring Random Trees (RRT) probe the continuous space. For a humanoid robot, these geometric paths are then converted into footsteps using heuristics, such as placing steps within the projected support polygon. While efficient, these pure geometric approaches often neglect kinematic feasibility and dynamic constraints.

1.1.2 Dynamics-Aware Planning. To ensure planned footsteps are executable, subsequent research integrated dynamic constraints directly into the search process. This involves formulating footstep planning as an optimization problem where footstep positions $(x_i, y_i, \theta_i)$ and timings are decision variables. A typical formulation for a humanoid robot might be:

$$
\min_{\mathbf{x}} \quad \sum_{i=1}^{N} ( \underbrace{w_1 \|\mathbf{p}_{CoM}^i – \mathbf{p}_{ZMP}^i\|^2}_{\text{Stability Cost}} + \underbrace{w_2 \|\mathbf{u}_i\|^2}_{\text{Control Effort}} )
$$

$$
\text{subject to:} \quad \mathbf{p}_{ZMP}^i \in \mathcal{SP}^i, \quad \mathbf{p}_{foot}^i \in \mathcal{T}_{feasible}, \quad \mathbf{x}_{min} \leq \mathbf{x} \leq \mathbf{x}_{max}
$$

where $\mathbf{p}_{CoM}^i$ and $\mathbf{p}_{ZMP}^i$ are the Center of Mass and Zero Moment Point positions at step $i$, $\mathcal{SP}^i$ is the support polygon, and $\mathcal{T}_{feasible}$ represents feasible terrain regions. This ensures the plan respects the humanoid robot’s balance limits and terrain affordances.

1.1.3 Hierarchical Footstep Planning. To handle environmental uncertainty and dynamic changes, a two-layer hierarchy is often employed. A global planner generates a coarse guide path using lower-resolution maps, while a local, receding-horizon planner refines footstep placements in real-time using high-fidelity local terrain data (e.g., from LiDAR). This structure balances long-term goal-directedness with local adaptability for the humanoid robot.

1.1.4 Learning-Assisted Planning. Recently, machine learning has been introduced to model the complex relationship between terrain geometry and viable footsteps. Deep neural networks or reinforcement learning agents can be trained to predict footstep feasibility or directly propose step locations, allowing a humanoid robot to generalize to novel terrain patterns. However, these methods depend heavily on the quality and breadth of training data.

The progression of footstep planning methods is summarized in Table 1.

Planning Category	Core Idea	Typical Formulation/Algorithm	Advantages for Humanoid Robot	Limitations
Geometry-Based	Find collision-free path, then sample footsteps.	A*, RRT, Heuristic foot placement.	Fast, simple, good for global navigation.	Ignores dynamics, may produce infeasible steps.
Dynamics-Aware	Optimize footstep sequence under dynamic constraints.	Nonlinear Optimization, Model Predictive Control (MPC).	Produces dynamically feasible and stable plans.	Computationally heavier, requires accurate model.
Hierarchical	Separate global guide from local refinement.	Global A*/RRT + local MPC/optimization.	Balances efficiency and adaptability.	Requires careful coupling between layers.
Learning-Assisted	Use learned models to evaluate or generate footsteps.	Deep Q-Networks, Feasibility Classifiers.	Can handle complex terrain patterns, adaptive.	Data-intensive, generalization challenges, black-box.

1.2 Motion Trajectory Planning

Once a footstep sequence is determined, the humanoid robot must generate smooth, dynamically consistent trajectories for its Center of Mass (CoM) and swing feet. These trajectories serve as the primary reference for the whole-body controller.

1.2.1 Trajectory Optimization with Simplified Models. The most established approach uses reduced-order models to make trajectory generation tractable. The Linear Inverted Pendulum Model (LIPM) is foundational, modeling the humanoid robot as a point mass atop a massless leg. The dynamics are given by:

$$
\ddot{x}_{CoM} = \frac{g}{z_c} (x_{CoM} – x_{ZMP})
$$

where $g$ is gravity and $z_c$ is the constant CoM height. The Zero Moment Point (ZMP), $\mathbf{p}_{ZMP}$, is the point on the ground where the net moment is zero. Stability is ensured by keeping the ZMP within the support polygon. By previewing desired ZMP trajectories, optimized CoM trajectories can be solved efficiently, often using Model Predictive Control (MPC). For a humanoid robot on uneven terrain, the 3D LIPM or variable-height versions are used:

$$
\ddot{\mathbf{r}}_{CoM} = \frac{g + \ddot{z}_c}{z_c} (\mathbf{r}_{CoM} – \mathbf{r}_{ZMP}) – \mathbf{g}
$$

This allows planning over non-planar surfaces.

1.2.2 Online Trajectory Adjustment and Recovery. To react to disturbances or perception updates, trajectories must be adjustable in real-time. This can involve modifying the swing foot trajectory to avoid a newly detected obstacle or performing “capture point” based adjustments to the CoM trajectory for balance recovery. The Instantaneous Capture Point (ICP) or Divergent Component of Motion (DCM), $\boldsymbol{\xi}$, is defined as:

$$
\boldsymbol{\xi} = \mathbf{r}_{CoM} + \frac{\dot{\mathbf{r}}_{CoM}}{\omega}, \quad \text{where } \omega = \sqrt{\frac{g}{z_c}}
$$

Controlling the ICP to a location inside the future support polygon guarantees that the humanoid robot can come to a stop. Online planners often adjust the step timing or location to steer the ICP.

1.2.3 Data-Driven Trajectory Planning. In highly complex environments, analytical models may fail to capture all dynamics or terrain interactions. Data-driven methods, particularly Reinforcement Learning (RL), are used to learn trajectory adaptation policies. A policy $\pi_\theta(\mathbf{a}_t | \mathbf{s}_t)$ can be trained to output adjustments to planned foot positions or torso orientation, where the state $\mathbf{s}_t$ includes proprioception, terrain heightmaps, and the current plan. The reward function $R(\mathbf{s}_t, \mathbf{a}_t)$ typically encourages stability, progress, and low effort.

Table 2 compares these trajectory planning paradigms for a humanoid robot.

Trajectory Method	Core Model	Key Equation/Principle	Advantages	Limitations
Simplified Model (LIPM/ZMP)	Linear Inverted Pendulum	$\ddot{x}_{CoM} = \frac{g}{z_c}(x_{CoM}-x_{ZMP})$	Very fast, convex optimization, proven stability.	Assumes constant height, point mass, no angular momentum.
Online Adjustment (ICP/DCM)	Capture Point Dynamics	$\boldsymbol{\xi} = \mathbf{r}_{CoM} + \frac{\dot{\mathbf{r}}_{CoM}}{\omega}$	Intuitive for balance recovery, enables reactive stepping.	Still based on simplified dynamics, step adjustment may be abrupt.
Data-Driven (RL)	Learned Policy Network	$\mathbf{a}_t = \pi_\theta(\mathbf{s}_t)$	Can learn complex terrain adaptation, end-to-end optimizable.	Massive training data needed, sim2real gap, safety concerns.

2. Locomotion Control for Humanoid Robots

Control is the layer that physically realizes the planned motion. Its objectives are to track reference trajectories, maintain stability under perturbations, and manage the contact interactions between the humanoid robot and the world.

2.1 Simplified Model-Based Control

These controllers explicitly use low-dimensional models to compute stabilizing actions.

2.1.1 ZMP-Based Tracking Control. The foundational method uses the LIPM model. A feedback controller, often a Proportional-Derivative (PD) or Linear Quadratic Regulator (LQR), tracks the planned ZMP or CoM trajectory by modulating the actual ZMP through ankle torques or CoM acceleration.

2.1.2 Capture Point Control. This method focuses on regulating the ICP/DCM. A simple yet effective controller for a humanoid robot drives the ICP to a desired location $\boldsymbol{\xi}^{des}$ within the support foot by commanding a CoM acceleration:

$$
\dot{\boldsymbol{\xi}} = \omega (\boldsymbol{\xi} – \mathbf{r}_{ZMP}) \quad \Rightarrow \quad \mathbf{r}_{ZMP}^{cmd} = \boldsymbol{\xi} – \frac{\dot{\boldsymbol{\xi}}^{des}}{\omega} – K_\xi (\boldsymbol{\xi} – \boldsymbol{\xi}^{des})
$$

where $K_\xi$ is a control gain. This provides a direct mechanism for push recovery.

2.1.3 Momentum-Based Control. Recognizing the limitations of point-mass models, momentum-based methods control the linear and angular momentum of the full humanoid robot body. The rate of change of centroidal momentum $\mathbf{h} = [\mathbf{k}^\top, \mathbf{l}^\top]^\top$ (linear $\mathbf{k}$ and angular $\mathbf{l}$) is equal to the net external wrench:

$$
\dot{\mathbf{h}} = \sum_{i=1}^{N_c} \begin{bmatrix} \mathbf{I}_3 \\ [\mathbf{r}_i]_\times \end{bmatrix} \mathbf{f}_i + \begin{bmatrix} m\mathbf{g} \\ \mathbf{0} \end{bmatrix}
$$

where $\mathbf{f}_i$ is a contact force at position $\mathbf{r}_i$, and $[\mathbf{r}_i]_\times$ is the skew-symmetric matrix. By optimizing for contact forces that achieve desired momentum rates, a more natural balance control for the humanoid robot is achieved, especially on uneven contact surfaces.

2.2 Whole-Body Control (WBC)

WBC is the prevailing model-based paradigm for controlling a high-degree-of-freedom humanoid robot. It formulates control as a constrained optimization problem that solves for joint torques $\boldsymbol{\tau}$ directly, considering the full rigid-body dynamics and multiple competing tasks.

The dynamics of a floating-base humanoid robot are given by:
$$
\mathbf{M}(\mathbf{q})\ddot{\mathbf{q}} + \mathbf{C}(\mathbf{q}, \dot{\mathbf{q}}) = \mathbf{S}^\top \boldsymbol{\tau} + \mathbf{J}_c^\top(\mathbf{q}) \boldsymbol{\lambda}
$$
where $\mathbf{q}$ is the generalized coordinates, $\mathbf{M}$ is the inertia matrix, $\mathbf{C}$ contains Coriolis and gravity terms, $\mathbf{S}$ selects actuated joints, $\mathbf{J}_c$ is the contact Jacobian, and $\boldsymbol{\lambda}$ are contact forces.

A hierarchical whole-body controller solves a sequence of Quadratic Programs (QPs). At each priority level $k$, it finds control inputs that satisfy task $\mathbf{J}_k \ddot{\mathbf{q}} = \dot{\mathbf{v}}_k^{des}$ as closely as possible, subject to the dynamics and constraints, and without interfering with higher-priority tasks. A common formulation is:

$$
\min_{\ddot{\mathbf{q}}, \boldsymbol{\tau}, \boldsymbol{\lambda}} \quad \|\mathbf{J}_1 \ddot{\mathbf{q}} – \dot{\mathbf{v}}_1^{des}\|^2
$$

$$
\text{subject to:} \quad \mathbf{M}\ddot{\mathbf{q}} + \mathbf{C} = \mathbf{S}^\top\boldsymbol{\tau} + \mathbf{J}_c^\top\boldsymbol{\lambda} \quad \text{(Dynamics)}
$$

$$
\quad \quad \quad \quad \mathbf{J}_c \ddot{\mathbf{q}} + \dot{\mathbf{J}}_c \dot{\mathbf{q}} = \mathbf{0} \quad \text{(Contact Constraints)}
$$

$$
\quad \quad \quad \quad \boldsymbol{\lambda} \in \mathcal{FC} \quad \text{(Friction Cone)}
$$

$$
\quad \quad \quad \quad \boldsymbol{\tau}_{min} \leq \boldsymbol{\tau} \leq \boldsymbol{\tau}_{max}
$$

Subsequent QPs for lower-priority tasks (e.g., arm posture, gaze direction) are solved in the null-space of higher-priority constraints (e.g., contact dynamics, swing foot tracking). This framework allows a humanoid robot to simultaneously balance, walk, and use its arms, making it highly versatile.

2.3 Reinforcement Learning Control

Deep Reinforcement Learning (DRL) represents a paradigm shift from model-based design. A policy network $\pi_\theta$ is trained through trial and error in simulation to map observations $\mathbf{o}_t$ (proprioception, terrain scans, commands) directly to joint targets $\mathbf{a}_t$ (positions, torques, or residuals). The objective is to maximize the expected cumulative reward $R$:

$$
\max_\theta \mathbb{E}_{\tau \sim \pi_\theta} \left[ \sum_{t=0}^{T} \gamma^t R(\mathbf{s}_t, \mathbf{a}_t) \right]
$$

Rewards are engineered to encourage stable, efficient, and command-following locomotion for the humanoid robot. Key advantages include the automatic synthesis of complex recovery behaviors and inherent robustness to model inaccuracies. State-of-the-art approaches often use actor-critic architectures like Proximal Policy Optimization (PPO) or Soft Actor-Critic (SAC), coupled with extensive domain randomization to bridge the sim-to-real gap.

A comparison of control strategies is presented in Table 3.

Control Strategy	Mathematical Foundation	Key Features for Humanoid Robot	Advantages	Challenges
Simplified Model (ZMP/ICP)	LIPM dynamics, $\dot{\boldsymbol{\xi}} = \omega(\boldsymbol{\xi}-\mathbf{r}_{ZMP})$	Explicit stability criterion, low computational cost.	Simple, interpretable, real-time capable on embedded hardware.	Limited by model fidelity, poor utilization of full-body dynamics.
Whole-Body Control (WBC)	Full rigid-body dynamics, Hierarchical QP: $\min \\|\mathbf{J}\ddot{\mathbf{q}}-\dot{\mathbf{v}}^{des}\\|$	Considers full dynamics, multi-task prioritization, direct torque control.	High performance, dynamically consistent, enables complex multi-tasking.	Computationally intensive, sensitive to model inaccuracies, complex tuning.
Reinforcement Learning (RL)	Policy optimization: $\max_\theta \mathbb{E}[\sum \gamma^t R_t]$	End-to-end policy mapping observations to actions.	Exceptional adaptability, can learn recovery from vast experience, model-free.	Extensive simulation training required, sim2real transfer, safety verification is hard.

3. Towards Integrated Architectures

The traditional modular pipeline—perception, planning, control—introduces latency and error propagation. The frontier of research aims to create more tightly coupled or even end-to-end architectures for the humanoid robot.

3.1 Perception-Control Co-Design. Here, perceptual features (like terrain height maps or obstacle embeddings) are fed directly into the controller, bypassing explicit geometric planning. For example, a neural network policy might take a depth image and proprioceptive state to output joint torques, learning implicit strategies for terrain negotiation.

3.2 Integrated Planning and Control. Model Predictive Control (MPC) naturally integrates a finite-horizon planning problem within the control loop. A full-body or centroidal dynamics MPC for a humanoid robot solves, in real-time, for a sequence of future states and controls (footsteps, contact forces, CoM motion) that minimize a cost function, with only the first control applied. This unifies reaction and prediction.

3.3 End-to-End Learning. The most integrated approach uses deep RL to train a single policy that processes raw sensor inputs (cameras, IMU) and outputs low-level actuator commands to achieve a high-level goal (e.g., “go to the red door”). This represents the ultimate fusion of perception, planning, and control but faces significant challenges in sample efficiency, safety, and interpretability.

4. Future Perspectives and Conclusion

Despite remarkable progress, enabling a humanoid robot to operate with the robustness and versatility of a human in arbitrary unstructured environments remains an open challenge. Future advancements will likely stem from several convergent trends.

4.1 Synergy with Large Foundation Models. Integrating Large Language Models (LLMs) and Vision-Language Models (VLMs) could revolutionize high-level task understanding and semantic reasoning for a humanoid robot. An LLM could translate natural language instructions (“unload the boxes from the uneven pallet”) into structured sub-tasks, while a VLM helps identify objects and terrain properties, guiding the low-level locomotion planner.

4.2 Hardware-Algorithm Coevolution. Advances in actuator technology (high-torque density, variable impedance), sensor fusion, and onboard computing will unlock new algorithmic capabilities. Conversely, algorithms must be designed for efficiency and robustness on resource-constrained platforms. The design of the humanoid robot itself—its limb morphology, joint compliance, and sensor placement—will increasingly be co-optimized with its control and learning algorithms.

4.3 Standardized Benchmarks and Evaluation. The field lacks unified datasets, simulation benchmarks, and performance metrics. Establishing standardized testing protocols for a humanoid robot—covering diverse terrains, disturbance profiles, and manipulation tasks—is crucial for objective comparison and accelerated progress.

4.4 Safety and Formal Verification. As humanoid robots move closer to human cohabitation, guaranteeing safe operation is paramount. Research into real-time safety filters, reachability analysis, and formally verified control barriers for humanoid robot locomotion will be essential for deployment.

In conclusion, the journey towards proficient humanoid robot locomotion in unstructured environments has evolved from basic stability controllers to sophisticated, hierarchical optimization and learning frameworks. Planning methods now integrate geometric, dynamic, and learned feasibility to chart a course, while control strategies leverage full-body dynamics and machine learning to execute it with robustness. The most promising path forward lies not in any single approach, but in the intelligent fusion of model-based priors for safety and efficiency with learning-based adaptability for generalization. As hardware matures and algorithms become more integrated and cognizant of semantics, the vision of a truly capable, general-purpose humanoid robot steadily transitions from science fiction to engineering reality.