In recent years, the field of robotics has witnessed unprecedented growth, driven by breakthroughs in artificial intelligence, servo control, environmental perception, and new materials. Among these advancements, humanoid robots stand out as quintessential embodiments of embodied intelligence, gradually transitioning from experimental platforms to real-world applications. The widespread interest in humanoid robots stems from their inherent “human-like” advantages. Unlike traditional industrial robots, humanoid robots offer superior environmental adaptability, greater task versatility, and more natural human-robot interaction. In highly unstructured real-world environments, where architectural norms, tool scales, and spatial layouts are designed around the human form, a robot that mimics human structure and movement can integrate more seamlessly and perform diverse tasks. Moreover, humanoid robots are more readily accepted and trusted in social service scenarios, such as elderly care and educational companionship, which is critical for public service applications.
Autonomous intelligence remains one of the most prominent directions in humanoid robot research. However, given that artificial intelligence has yet to overcome the bottleneck of general intelligence, fully autonomous systems still face significant limitations, particularly in high-complexity tasks, rapidly changing environments, incomplete perceptual information, and high-stakes scenarios where errors are costly. For instance, in space robotics, extravehicular operations often involve complex unstructured environments that are difficult to model accurately. Optical interference, such as stray light, strong shadow contrasts, and multiple reflections, can cause visual systems to fail in target recognition and localization, severely undermining the effectiveness of autonomous systems. To address these challenges, dual-arm teleoperation technology, which incorporates human experiential judgment and intent mapping, has emerged as a key pathway to practical deployment. Dual-arm teleoperation systems use master-slave mapping to translate human operator movements in real-time to the arms of humanoid robots, enabling manipulation of objects in remote environments. With multimodal sensory feedback (e.g., visual, force, tactile) and immersive interaction technologies (e.g., virtual reality, augmented reality), operators gain a strong sense of presence, enhancing their understanding of environmental states and improving control precision.

Dual-arm teleoperation is not merely a stopgap for autonomous system shortcomings but a crucial step toward human-robot collaborative intelligence. Through teleoperation, humanoid robots can learn human operational strategies and behavior patterns in complex tasks, providing high-quality data for reinforcement learning, imitation learning, and other methods that enhance robotic intelligence. Furthermore, dual-arm teleoperation systems represent a direct technological simulation of human operational capabilities, involving high-dimensional coordinated control, motion planning, sensor fusion, time-delay communication, and human-robot interaction design. These systems are among the most challenging and integrative research areas in robotics, driving advancements in system architecture, intelligence, and interaction modalities.
Evolution of Humanoid Robots
Humanoid robots are designed to mimic human form and behavior, aiming to equip robots with the ability to adapt to unstructured and dynamic environments and perform diverse tasks with human-like decision-making and actions. The development of humanoid robots began in the late 1960s and has progressed through three distinct phases, characterized by advancements in structure, perception, and cognition.
| Phase | Time Period | Key Features | Representative Robots |
|---|---|---|---|
| Initial Phase | 1970s-1990s | Focus on bipedal locomotion and basic motion control using rigid structures | WABOT series |
| Perception Phase | 2000s-2010s | Integration of sensors for environmental perception and simple human-robot interaction | ASIMO, NAO |
| Cognitive Phase | 2010s-Present | Embodiment of advanced AI, dynamic control, and task autonomy | Atlas, Optimus, Walker X |
In the initial phase, research focused on replicating human skeletal structures and achieving basic locomotion capabilities. Early work, such as the WABOT series from Waseda University, demonstrated bipedal walking and arm movements. The WABOT-1, for example, was the first full-scale humanoid robot with basic motion abilities, while WABOT-2 enhanced manipulation skills, enabling tasks like playing an electronic keyboard. Concurrently, institutions like MIT’s Leg Lab explored dynamic gait control in robots like Spring Flamingo, emphasizing stable walking and jumping without full humanoid form. This era saw widespread research into joint actuation, materials, and stability algorithms.
The perception phase, emerging in the 21st century, leveraged improvements in computational power, image processing, and sensor integration. Humanoid robots began to incorporate cameras, microphones, and force sensors for environmental awareness. Honda’s ASIMO series exemplified this, featuring stable walking, obstacle avoidance, voice recognition, and face tracking. Similarly, the NAO robot from Aldebaran Robotics became a staple in education and research due to its compact design and interactive capabilities. Although perception was largely based on predefined models, these systems laid the groundwork for higher-level intelligence.
In the cognitive phase, advances in AI have propelled humanoid robots toward embodied intelligence, with capabilities for deep cognitive-decision-control synergy. Boston Dynamics’ Atlas robot, initially hydraulic-driven, demonstrated remarkable dynamic control through actions like backflips and obstacle jumping. More recently, Tesla’s Optimus integrates large language models (LLMs) to foster cognitive general intelligence, while electric-driven versions like the new Atlas reduce costs and improve efficiency. Chinese innovations, such as Ubtech’s Walker X and Unitree’s H1, highlight trends toward high-performance, low-cost humanoid robots with open development interfaces, accelerating research and application.
Key Technologies in Humanoid Robots
To achieve human-like intelligence and operation, humanoid robots rely on an integrated framework of perception, cognition, and control technologies. These components work in concert to enable information acquisition, task execution, and adaptation.
Environmental Perception
Environmental perception systems are fundamental for humanoid robots to gather external data through multimodal sensors and semantic processing. Typical components include cameras, RGB-D depth sensors, LiDAR, inertial measurement units (IMUs), and force-tactile sensors on the hands and body. Vision allows humanoid robots to recognize objects and understand spatial structures, supporting navigation, obstacle avoidance, and target localization. For example, the HumanoidPano framework uses 360° spherical vision transformers fused with LiDAR point clouds to create semantic bird’s-eye view maps, addressing self-occlusion and limited field-of-view issues in complex scenes. Force and tactile perception ensure safety and compliance during environmental contact or human interaction, such as in grasping or collaborative tasks. Recent developments, like flexible electronic skin based on electrical impedance tomography (EIT), enable millimeter-scale localization for joint bending and contact pressure, enhancing dexterous control. Multimodal sensor fusion is critical for robustness in unstructured environments, where single sensors may fail. The integration of visual, force, and tactile data allows humanoid robots to structurally “understand” their surroundings, providing reliable input for cognition and control.
The perception process can be modeled using sensor fusion equations. For instance, the combined perception output $P$ from multiple sensors can be expressed as:
$$ P = \sum_{i=1}^{n} w_i S_i $$
where $S_i$ represents the input from sensor $i$, and $w_i$ is the weight assigned based on reliability, often determined through Bayesian filtering or neural networks.
Cognition and Decision-Making
Cognition and decision-making systems form the core of robotic intelligence, bridging perception and action through task understanding and planning. With the advent of cross-modal large models, humanoid robots can now map language to actions more effectively. For example, RT-2 integrates vision-language knowledge with robotic control in an end-to-end model, enabling tasks like picking and sorting in unseen environments. The ELLMER framework combines GPT-4 with visual-force feedback loops to autonomously plan multi-stage subtasks from natural language instructions, maintaining consistency in dynamic settings.
Strategy networks, trained via deep reinforcement learning or imitation learning, allow humanoid robots to adapt to uncertain tasks. These networks are often trained in simulation with domain randomization and fine-tuned online for real-world transfer. The policy optimization can be formulated as a reinforcement learning problem, where the objective is to maximize the expected cumulative reward $R$:
$$ R = \mathbb{E} \left[ \sum_{t=0}^{T} \gamma^t r_t \right] $$
Here, $r_t$ is the reward at time $t$, $\gamma$ is the discount factor, and $T$ is the time horizon. This approach enables humanoid robots to generate and adjust action sequences autonomously, moving toward general-purpose platforms.
Motion Control
Motion control translates cognitive outcomes into precise, coordinated actions. Humanoid robots typically have over 30 degrees of freedom (DOF), requiring sophisticated control to maintain stability, posture, and path planning. Model predictive control (MPC) is widely used for whole-body motion planning and stability. For instance, a bio-inspired three-layer architecture incorporating deep residual modeling and reinforcement learning into MPC allows humanoid robots to perform dynamic multi-contact actions like jumping and sliding, even with low update frequencies. The MIT team employed the alternating direction method of multipliers (ADMM) to approximate solutions for nonlinear MPC, reducing computation time and enhancing robustness in scenarios with thousands of control variables.
For fine manipulation in dual-arm tasks, compliance control and force feedback are essential. Impedance control and force-position hybrid strategies enable modal switching based on contact states. With millinewton-level tactile sensing and millisecond-level control cycles, humanoid robots can perform high-precision tasks like assembly and handovers safely and smoothly. The impedance control law is often given by:
$$ F = K_p (x_d – x) + K_d (\dot{x}_d – \dot{x}) $$
where $F$ is the force output, $K_p$ and $K_d$ are proportional and derivative gains, $x_d$ and $x$ are desired and actual positions, and $\dot{x}_d$ and $\dot{x}$ are desired and actual velocities. This ensures adaptability in human-robot interaction.
In summary, the synergy of perception, cognition, and control technologies underpins the functionality of humanoid robots. Perception systems provide environmental input, cognition systems enable intelligent decision-making, and control systems execute tasks, together forming a robust framework for real-world applications.
Dual-Arm Teleoperation in Humanoid Robots
Dual-arm teleoperation has become a pivotal technology for humanoid robots, especially in scenarios where full autonomy is not yet feasible. By incorporating human judgment and dexterity, it enhances task success, safety, and efficiency, while also serving as a data source for machine learning.
Application Scenarios and Drivers
High-risk, low-tolerance applications drive the demand for dual-arm teleoperation. In nuclear decommissioning, such as at Fukushima, teleoperated双臂 systems perform tasks like debris removal in confined, high-radiation spaces. In space missions, projects like Surface Avatar demonstrate teleoperation of humanoid robots for sample collection and equipment deployment on extraterrestrial surfaces, emphasizing shared control where humans make decisions and robots execute with varying autonomy. Similarly, in power grid maintenance, teleoperation systems enable safe manipulation of high-voltage components, with shared control improving success rates and reducing operation times. Offshore energy platforms also benefit, as seen with NASA’s Valkyrie robot, which is designed for remote inspection and maintenance, reducing human exposure to hazardous conditions.
Even in semi-structured environments like hospitals or factories, teleoperation compensates for autonomous system limitations. In surgical robotics, for example, dual-arm teleoperation with force feedback allows precise tissue manipulation in sensitive areas, outperforming fully manual or autonomous approaches. Beyond operational needs, teleoperation frameworks like RoboCopilot use human demonstrations to train policy networks through interactive imitation learning, accelerating the transition from teleoperation to autonomy.
| Application Domain | Key Challenges | Teleoperation Benefits | Shared Control Features |
|---|---|---|---|
| Nuclear Decommissioning | High radiation, space constraints | Remote manipulation and obstacle clearance | Human judgment with robotic execution |
| Space Exploration | Communication delays, unstructured terrain | Flexible task execution and equipment handling | Dynamic autonomy adjustment |
| Power Grid Maintenance | High voltage, environmental variability | Reduced accident risk and improved precision | Intent recognition and path planning |
| Offshore Energy | Harsh conditions, limited access | Remote inspection and reduced human presence | Collaborative task allocation |
| Medical Surgery | Precision requirements, dynamic anatomy | Enhanced dexterity and safety | Force feedback and autonomy blending |
Key Technologies in Dual-Arm Teleoperation
Dual-arm teleoperation involves several core technologies: human-robot mapping, multimodal feedback, and master-slave control strategies, each addressing challenges in transparency, stability, and efficiency.
Human-Robot Mapping
Human-robot mapping translates operator movements into robot control commands, dealing with DOF matching and kinematic transformations. Common methods include motion capture (e.g., optical or IMU-based), wearable exoskeletons, and intent prediction via electromyography or electroencephalography. In humanoid robots, isomorphic skeleton mapping is prevalent, where human and robot topologies are aligned for high-fidelity motion replication. Systems like HERMES extend this to whole-body control, using balance feedback to maintain stability. Adaptive learning approaches, such as the TWIST framework, pre-train pose-tracking policies in simulation and refine them with reinforcement learning, achieving low-latency trajectory mapping. On the hardware side, low-cost platforms like SPARK-Remote demonstrate effective teleoperation with force compensation, making the technology more accessible.
The mapping function can be represented as a transformation $T$ between human joint angles $\theta_h$ and robot joint angles $\theta_r$:
$$ \theta_r = T(\theta_h) $$
where $T$ may involve kinematic models or neural networks for adaptive control.
Multimodal Perceptual Feedback
Multimodal feedback enhances operator immersion through visual, force, tactile, auditory, and postural channels. Visual feedback uses 3D reconstruction and multi-view displays to render remote environments in VR or AR. Force and tactile feedback, provided via desktop devices or exoskeletons, offer impedance responses that improve precision and reduce errors in dynamic tasks. For example, force feedback allows operators to sense object hardness and friction, akin to proprioception. Recent research expands this to “whole-body bilateral” systems, integrating operator psychological and physiological states into the control loop. By monitoring EEG signals for cognitive metrics like attention and fatigue, systems can dynamically adjust human-robot control weights, optimizing performance based on operator state.
The feedback force $F_f$ in haptic devices is often computed using a spring-damper model:
$$ F_f = K (x_r – x_o) + B (\dot{x}_r – \dot{x}_o) $$
where $K$ and $B$ are stiffness and damping coefficients, $x_r$ and $x_o$ are robot and operator positions, and $\dot{x}_r$ and $\dot{x}_o$ are their velocities.
Master-Slave Cooperative Control
Control strategies ensure stability and efficiency by balancing transparency and operator workload. Hierarchical architectures parse high-level intent and execute low-level actions, with shared control allocating tasks between humans and robots based on context. In high-latency scenarios, robots handle local path optimization while humans provide strategic input. For instance, in surgical suturing, transformer models recognize operator intent, and confidence-based fusion dynamically blends human and autonomous control. Whole-body control (WBC) strategies address the coupling between arm movements and stability, using task-prioritized optimization to compute control variables that maintain balance during manipulation. The WBC problem can be formulated as a quadratic programming optimization:
$$ \min_{\tau} \| J \tau – \dot{x}_d \|^2 $$
subject to $A \tau \leq b$, where $\tau$ is the torque vector, $J$ is the Jacobian matrix, $\dot{x}_d$ is the desired task velocity, and $A$ and $b$ represent constraints like stability and torque limits.
Overall, dual-arm teleoperation integrates mapping, feedback, and control to bridge human expertise and robotic execution, advancing humanoid robots toward collaborative intelligence.
Conclusion and Future Perspectives
As research progresses, humanoid robots are increasingly deployed in diverse real-world scenarios. Their structural generality and interactive naturalness provide distinct advantages in unstructured environments, with dual-arm teleoperation serving as a critical enabler for practical application. Teleoperation not only leverages human judgment to enhance success and safety in the current autonomy-limited landscape but also generates valuable data for machine learning, accelerating the evolution of robotic intelligence.
Looking ahead, several challenges must be addressed to advance dual-arm teleoperation in humanoid robots. First, the fusion of autonomy and teleoperation requires more adaptive shared control frameworks that dynamically allocate control authority based on task complexity and risk. Second, multimodal feedback and human perception need standardization to systematically integrate visual, force, and psychological factors, improving presence and efficiency. Finally, engineering solutions must focus on generalizable, modular teleoperation platforms that are cost-effective and flexible, facilitating widespread adoption.
Future research should explore AI-driven intent recognition and predictive control to reduce latency and enhance synergy. The integration of large language models and embodied AI could enable more natural human-robot dialogue and task understanding. Additionally, advancements in materials and actuation may lead to lighter, more dexterous humanoid robots capable of finer manipulation. As these technologies mature, humanoid robots with dual-arm teleoperation are poised to become indispensable in complex environments, extending human capabilities and fostering new applications in industry, healthcare, and beyond.
In summary, the journey of humanoid robots is marked by continuous innovation in perception, cognition, and control. Dual-arm teleoperation represents a transformative step, blending human ingenuity with robotic precision to unlock the full potential of embodied intelligence. Through collaborative efforts, we can overcome existing barriers and usher in an era where humanoid robots seamlessly integrate into our daily lives, performing tasks that were once deemed impossible.