Dual-Arm Teleoperation for Humanoid Robots

In recent years, the field of robotics has experienced unprecedented growth, driven by breakthroughs in artificial intelligence, servo control, environmental perception, and new materials. As a typical representative of embodied intelligence, humanoid robots are gradually transitioning from experimental platforms to real-world applications, becoming a hallmark of intelligent technology development. The widespread attention on humanoid robots stems from their inherent “human-like” advantages. Compared to traditional industrial robots, humanoid robots exhibit higher environmental adaptability, greater task versatility, and more natural human-robot interaction capabilities. In highly unstructured real-world environments, human societal norms, tool scales, and spatial layouts are designed around the human form, meaning that a robot with a structure and movement pattern approximating humans can more easily integrate into these environments and perform diverse tasks. Moreover, humanoid robots are more readily accepted and trusted in social service scenarios, which is particularly critical in public service applications such as elderly care, education, and companionship.

Autonomous intelligence remains one of the most prominent directions in humanoid robot research. However, given that artificial intelligence has not yet overcome the bottleneck of general intelligence, fully relying on autonomous systems still has significant limitations, especially in practical applications with high task complexity, rapidly changing environments, incomplete perceptual information, and high error costs. In such scenarios, the stability and adaptability of existing autonomous systems often fall short of requirements. For instance, in space robotics, extravehicular operations frequently involve complex unstructured environments. These environments are not only difficult to model accurately but are also subject to optical interference such as stray light, strong shadow contrast, and multiple reflections, leading to frequent failures in target recognition and localization by visual systems, thereby severely limiting the effectiveness of autonomous systems. To address this, the introduction of dual-arm teleoperation technology, which incorporates human experiential judgment and intent mapping, has become a key pathway to practical deployment of humanoid robots. Dual-arm teleoperation systems map human operators’ movement intentions to the humanoid robot’s arms in real-time through master-slave mapping, enabling manipulation of target objects in remote environments. With the assistance of multimodal perceptual feedback (e.g., visual, force, tactile) and immersive interaction technologies (e.g., virtual reality, augmented reality), operators can achieve a strong sense of presence, significantly enhancing their understanding of environmental states and improving control precision.

Dual-arm teleoperation technology is not merely a temporary solution to compensate for the shortcomings of autonomous systems but also a critical step toward human-robot collaborative intelligence for humanoid robots. Through teleoperation systems, robots can learn human operational strategies and behavior patterns in complex tasks, providing high-quality samples for subsequent reinforcement learning and imitation learning, thereby enhancing the robot’s own intelligence level. Furthermore, dual-arm teleoperation systems themselves represent the most direct technological simulation of human operational capabilities. They involve core issues such as high-dimensional coordinated control, motion planning, perceptual fusion, time-delay communication, and human-robot interaction design, making them one of the most challenging and integrative research directions in robotics. Their development not only supports the deployment of humanoid robots in typical engineering applications but also drives continuous progress in the overall robot system structure, intelligence level, and interaction methods.

Humanoid robots are a class of bionic general-purpose robots designed primarily to imitate human form and behavior. Research in this area aims to endow robots with the ability to adapt to unstructured and dynamically complex environments, performing diverse tasks with human-like decision-making and behavior patterns. The field began in the late 1960s, and after nearly half a century of technological accumulation, its development can be summarized into three typical stages. The initial stage focused on imitating human body structure and achieving basic motor capabilities. Researchers commonly used rigid structures to build robots with human skeletal proportions, striving to break through technical bottlenecks in bipedal gait and standing stability. Early work originated from Waseda University in Japan, which developed a series of humanoid robots, including the WABIAN and WABOT series. WABOT-1 was the first full-scale humanoid robot with basic motor capabilities, featuring bipedal walking, standing control, and arm swinging. Subsequent WABOT-2 significantly improved the dexterity of mechanical hands while maintaining coordinated body movement, enabling complex actions such as playing an electronic keyboard with both hands in response to simple commands. The Leg Laboratory at MIT conducted systematic research on bipedal and multi-legged robots from the 1980s to the early 1990s, with representative成果 including GeekBot and Spring Flamingo. These robots did not aim for a complete humanoid structure but instead focused on exploring control methods for stable walking, running, jumping, and other dynamic gait behaviors. During the same period, countries such as South Korea and Germany joined humanoid robot research, extensively involving joint drive methods, rigid-soft composite material applications, gait and stability control algorithms, and more.

Development Stages of Humanoid Robots
Stage	Time Period	Key Features	Representative Robots
Initial	1970s-1990s	Focus on bipedal locomotion and basic stability	WABOT-1, WABOT-2
Intermediate	2000s-2010s	Integration of perception and simple human-robot interaction	ASIMO, NAO
Advanced	2020s-present	Embodied intelligence, dynamic control, and AI integration	Atlas, Optimus, Walker X

Entering the 21st century, with significant improvements in hardware computing power, image processing technology, and sensor integration, the research goals of humanoid robots expanded beyond mechanical structure and basic motion control to include perception of the external environment and preliminary human-robot interaction. Robot systems began integrating various perceptual modules such as cameras, microphones, and force sensors, enabling the perception and simple understanding of surrounding environmental information through sensor networks. The most representative achievement of this stage was the ASIMO series of humanoid robots developed by Honda. ASIMO not only achieved stable bipedal walking and obstacle avoidance but also introduced speech recognition, image recognition, and face tracking functions, allowing simple language interaction with humans and dynamic adjustment of behavior strategies based on environmental changes. The modular structure, centralized control system, and human-robot interaction mechanisms adopted by ASIMO provided important references for subsequent humanoid robot system design. During the same period, the NAO robot launched by Aldebaran Robotics in France, with its compact structure, friendly interaction, and open programming, was widely used in educational and research scenarios. It featured speech recognition, face recognition, and multiple action modes, becoming an important bridge for the public to understand humanoid robots. Humanoid robots in this stage were no longer just “motion machines” but began to possess environmental understanding and task response capabilities. Although their perceptual abilities were primarily based on predefined models and lacked deep semantic understanding and situational reasoning, they laid the foundation for subsequent research on more advanced intelligent behaviors.

In the past decade, the rapid development of artificial intelligence technology has continuously propelled humanoid robots toward the goal of becoming embodied intelligent agents. Humanoid robots have begun to possess deep synergistic capabilities in cognition, decision-making, and control, enabling them to complete complex processes such as task understanding, behavior planning, and autonomous operation in dynamic environments. One representative achievement of this stage is the Atlas series of robots developed by Boston Dynamics. Unlike previous robots that primarily relied on rigid structures and preset trajectory control, the early version of Atlas used hydraulic drive combined with an inertial navigation system, exhibiting highly flexible dynamic control capabilities. It could perform a series of high-difficulty operations such as backflips, obstacle crossing, jumping rotations, and object throwing and catching, with action speed and smoothness approaching human performance. In 2021, Tesla’s公开 Optimus robot sought to deeply integrate large language models (LLMs) with humanoid robots, promoting the transition of robot systems from tool-based automation to cognitive general intelligence agents. Its architectural approach represents the industry’s active exploration direction for “embodied large models.” Since then, humanoid robots have begun to reach a commercial inflection point characterized by high performance, low cost, and mass production. In 2024, Boston Dynamics launched the all-electric Atlas, replacing the hydraulic system with sealed joints, lightweight materials, and high-efficiency drives, significantly reducing operational costs. During this period, the Chinese humanoid robot industry rapidly emerged, with representative enterprises and research成果 continuously achieving breakthroughs. Ubtech’s Walker X robot possesses whole-body high-degree-of-freedom coordinated control capabilities, and its industrial version, Walker S1, completed multi-robot collaborative搬运 demonstrations in electronic factories. Unitree Robotics, with extensive experience in the high-dynamic quadruped robot field, recently launched the Unitree H1 humanoid robot equipped with self-developed high-speed joint modules, reducing the overall cost to below $100,000. Unitree also opened secondary development interfaces for the robot, promoting the application of humanoid robots in university research and robotics education.

To achieve human-like intelligence and operational capabilities, humanoid robots must not only have a本体 structure but also build a complete technological system of perception, cognition, and control. This system involves the协同 operation of multiple highly integrated technological modules, primarily including environmental perception, cognition and decision-making, and motion control. Together, they support the entire process from information acquisition to task execution for humanoid robots.

Environmental Perception Technology for Humanoid Robots

The environmental perception system is the primary means for humanoid robots to acquire external information, with its core lying in the integration of multimodal sensors and the semantic processing of perceptual data. Typical perceptual components include cameras, RGB-D depth cameras, LiDAR, inertial measurement units (IMUs), as well as force and tactile sensors installed on the mechanical hands and throughout the body. Visual perception enables the robot to recognize objects and understand the spatial structure of the surrounding environment, forming the basis for tasks such as navigation, obstacle avoidance, and target localization. The recently proposed HumanoidPano framework integrates 360° spherical vision transformers with LiDAR point cloud alignment, constructing a bird’s-eye view (BEV) semantic map, effectively addressing the issues of self-occlusion and limited field of view in humanoid robots, providing a structured perspective for autonomous navigation in complex scenes. On the other hand, force and tactile perception ensure safety and compliance when the robot interacts with the environment or humans, such as dynamically adjusting force during grasping, carrying, or collaborative operations. In recent research, a flexible electronic skin based on electrical impedance tomography (EIT) developed by the Czech Technical University can simultaneously sense touch pressure and joint bending, achieving millimeter-level positioning (error < 6 mm) in joint coverage areas, providing high-resolution tactile support for compliant control and human-robot collaboration in contact operations. Furthermore, multimodal perceptual fusion has become a key technology for enhancing the environmental understanding capability of humanoid robots. In unstructured or visually challenging real-world environments, a single sensor often cannot stably provide complete information, making the fusion of perceptual data from different modalities a necessary means to ensure perceptual robustness. These improvements in perceptual capabilities enable humanoid robots not only to “see” the world but also to structurally “understand” external information, thereby providing reliable support for robot cognition and control.

Key Technologies in Humanoid Robot Perception
Technology	Description	Applications
Visual Perception	Uses cameras and depth sensors for object recognition and spatial mapping	Navigation, obstacle avoidance
Force/Tactile Sensing	Measures contact forces and surface properties	Grasping, manipulation
Multimodal Fusion	Combines data from multiple sensors for robust perception	Complex environment understanding

Cognition and Decision-Making for Humanoid Robots

The cognition and decision-making system, as the core of a robot’s intelligence level, connects perception and execution, undertaking functions such as task semantic understanding and task action planning. In task semantic understanding, the robot can construct a spatial map with semantic labels based on visual and language inputs, thereby supporting robot navigation and interactive operations. In recent years, the introduction of cross-modal large models has significantly enhanced the mapping ability between language and actions. For example, the RT-2 model proposed by Google DeepMind integrates vision-language knowledge with robot control tasks end-to-end, achieving for the first time the ability to handle both language understanding and underlying motion control tasks in a single model, enabling the robot to complete tasks such as picking, sorting, and combining in unseen environments and under unfamiliar instructions. The ELLMER framework demonstrates another approach to multimodal fusion. By integrating the GPT-4 large language model with a vision-force feedback loop, ELLMER can autonomously plan multi-stage sub-tasks based on natural language instructions and continuously integrate multimodal perceptual information in dynamic environments, maintaining consistency in behavior generation and response accuracy.

In addition to language models, the learning and generalization capabilities of policy networks are also key to enabling robot systems to adapt to varying tasks. In behavior generation, modern humanoid robots widely use deep reinforcement learning, imitation learning, and other methods to construct policy networks, endowing them with the ability to adjust strategies and select actions when facing uncertain tasks. Current mainstream methods typically involve iterative training in simulation environments, supplemented by auxiliary means such as domain randomization and online fine-tuning, to more reliably achieve policy transfer from virtual environments to real robot platforms. Such control strategies can not only efficiently execute known tasks but also possess the ability to adaptively generate and adjust action sequences under unknown task conditions, serving as key technological support for advancing humanoid robots toward general-purpose operational platforms.

Motion Control Technology for Humanoid Robots

After achieving perception and understanding, humanoid robots must convert cognitive results into specific actions through high-precision, highly coordinated motion control technology. Humanoid robots typically have more than 30 degrees of freedom, and the movements of various parts of the body are highly coupled. Therefore, their control systems need to complete high-precision, low-latency action execution while maintaining center-of-mass stability, posture coordination, and path planning. In recent years, model predictive control (MPC) has been widely used for motion planning and stability control in high-degree-of-freedom robots. Ishihara et al. proposed a bio-inspired three-layer learning architecture that embeds deep residual modeling and reinforcement learning into MPC strategy generation, while introducing long and short delay response mechanisms模仿 the human nervous system, enabling the robot system to stably complete dynamic multi-contact actions such as jumping and sliding even under low strategy update frequencies. The method proposed by the MIT team starts from balancing optimization accuracy and control efficiency, using the alternating direction method of multipliers (ADMM) to approximately solve the quadratic programming problem in nonlinear model predictive control (NMPC), significantly reducing solution time and improving system robustness. This method stably controlled the MIT Humanoid robot to achieve complex dynamic actions such as cross-over stepping recovery at a frequency of 90 Hz under conditions including more than 2,000 control variables and a 32-step prediction horizon.

In fine tasks such as dual-arm manipulation, compliant control and force feedback mechanisms have become core means to ensure operational precision and safety. Current mainstream systems use impedance control and force-position hybrid control strategies, automatically switching control modes based on contact status in real-time control. Through millinewton-level tactile perception and millisecond-level control cycles, the robot can兼顾 safety and compliance when performing high-precision actions such as insertion, assembly, and handover, significantly enhancing its adaptability when interacting with humans or other objects.

Overall, the ability of humanoid robots from information acquisition to task execution relies on the synergistic cooperation of high-quality perception, high-level cognition, and high-precision control. The environmental perception system provides the entry point for understanding the external environment, the cognitive decision-making system constructs the intelligent center from task understanding to behavior selection, and the motion control system translates task objectives into specific operational actions. Together, they form the key technological system supporting humanoid robots for real-world tasks.

The motion control for a humanoid robot can be formulated using model predictive control. Let the state vector be $x_t$ and control input $u_t$. The optimization problem is:

$$ \min_{u_0, \ldots, u_{N-1}} \sum_{k=0}^{N-1} \left( x_k^T Q x_k + u_k^T R u_k \right) + x_N^T P x_N $$

subject to:

$$ x_{k+1} = f(x_k, u_k) $$
$$ g(x_k, u_k) \leq 0 $$

where $Q$, $R$, and $P$ are weighting matrices, $N$ is the prediction horizon, and $f$ represents the robot dynamics.

Dual-Arm Teleoperation for Humanoid Robots

Before humanoid robots possess full autonomous capabilities, many high-risk, high-complexity, low-error-tolerance application scenarios already have rigid demands for remote operation capabilities of robots. Dual-arm teleoperation, as a bridge connecting human experience and human-like execution, is gradually becoming one of the key technologies for the practical deployment of humanoid robots.

One typical application demand comes from nuclear facility operations. In the decommissioning process of the Fukushima Daiichi Nuclear Power Plant, to retrieve damaged fuel debris inside the reactor and remove high-radiation obstacles, the Japanese research team designed a dual-arm teleoperation system capable of completing tasks such as obstacle removal and grasping in confined spaces. This system, through remote control of dual-arm actuators, demonstrated good performance under conditions of strong radiation, strong interference, and space constraints, verifying the unique advantages of the dual-arm structure in complex environments inaccessible to humans.

In space missions, the Surface Avatar project jointly carried out by the European Space Agency (ESA) and the German Aerospace Center (DLR) aims to explore how humans can flexibly and efficiently control extraterrestrial robot teams through teleoperation systems to complete complex operational tasks while ensuring personnel safety. This project established a remote control link between the International Space Station and ground experimental platforms, enabling astronauts to operate multiple robots including the humanoid robot Rollin’ Justin to complete tasks such as rock sample collection, equipment deployment, and environmental survey. The core goal of the Surface Avatar project is not only to verify the feasibility of space robot teleoperation but also to build a human-robot shared control system: humans are responsible for task judgment and decision-making, robots flexibly execute based on autonomy levels, and when necessary, human operators can fully take over the robot’s action execution through control devices with force feedback.

Similar technological demands frequently appear in infrastructure maintenance tasks. Taking live maintenance of power transmission and distribution systems as an example, in high-voltage electric field environments of 10 kV and above, manual close-range operations are prone to accidents such as electric shock and arc discharge. Meanwhile, fully autonomous systems struggle to cope with sudden disturbances and complex topological structures. To address this issue, a team from Southeast University designed a teleoperation robot system based on heterogeneous variable mapping control, mapping the actions of a desktop-level controller to the dual-arm end-effectors of an outdoor operation platform in real-time, supplemented by deep learning technology for operator intent recognition, and combined with artificial potential field methods to construct obstacle avoidance paths, achieving smooth and precise grasping and operation of targets such as cables and悬挂物. Under shared control mode, the task success rate increased from 75% to 90%, and the average operation time decreased from 51.3 seconds to 39.0 seconds. This research fully demonstrates that in outdoor unstructured environments with limited line of sight, complex lighting, and unpredictable wind disturbances, human-robot shared control can not only reduce the operational burden but also significantly improve task completion efficiency and system robustness.

The ocean energy领域 also places high expectations on teleoperation. NASA collaborated with Woodside Energy in Australia to develop advanced teleoperation functions for the Valkyrie robot and plans to deploy the Valkyrie humanoid robot to offshore energy platforms to replace engineers in performing inspection and maintenance tasks. This solution will significantly reduce the frequency of manual boarding, lower the probability of accidents in high-risk offshore operations, and provide a示范 path for future remote automated operation and maintenance of deep-water oil and gas platforms.

Even in semi-structured environments such as hospitals and factories, fully relying on autonomous systems still faces challenges. Limited by operational precision, dynamic occlusion, temporary interference, and other issues, the robustness of deep models is often insufficient. At this time, embedding human operators into the robot control loop through teleoperation to achieve human-robot collaboration becomes an effective compensation mechanism for maintaining system stability. In the medical surgery field, the original intention of teleoperation robot design is to improve operational precision, extend the operational range of doctors, and improve their ergonomic experience. Taking the head and neck surgery assistive robot developed by the Southeast University team as an example, this system uses dual-arm teleoperation and force feedback devices to reconstruct intraoperative tactile sensation, enabling doctors to precisely retract tissues in narrow, variable, and highly sensitive surgical cavities, avoiding the risk of cutting through. This research shows that teleoperation systems using shared control mode are superior to purely manual or fully autonomous solutions, and their safety and operational efficiency are more engineering feasible.

In addition to operational demands, teleoperation systems are also becoming an important data source for promoting robot intelligence. Every segment of the motion trajectory during human operation contains rich operational intent and strategy information. The recently proposed RoboCopilot framework introduces an online demonstration mechanism for human-robot collaboration, dynamically筛选 high-value demonstration data during the teleoperation process, and continuously optimizes the policy network of the humanoid robot based on interactive imitation learning, ultimately forming operational skills transferable to new objects and new scenes. Compared to the traditional “train-deploy” paradigm, this method transforms teleoperation into a real-time process of policy learning, significantly shortening the technical cycle from manual operation to autonomous execution.

Tasks such as nuclear decommissioning, space operations, live maintenance, offshore management, and medical assistance collectively constitute huge practical demands for dual-arm teleoperation of humanoid robots. Simultaneously, the teleoperation process in turn provides interpretable, clearly structured training data for machine learning, accelerating the evolution of robot autonomous capabilities. This dual value makes dual-arm teleoperation no longer just a temporary solution in the transition stage but is becoming a key technological ladder for humanoid robots to truly leave the laboratory and integrate into the real world.

Key Technologies in Dual-Arm Teleoperation

Human-Robot Mapping Mechanisms

Human-robot mapping is the entry point of the entire teleoperation chain, tasked with efficiently and accurately converting human operators’ actions, postures, or intentions into control commands for the robot’s dual arms. This process not only includes the acquisition and encoding of motion information but also involves key issues such as degree-of-freedom matching and kinematic transformation. Current mainstream mapping methods primarily include posture mapping based on motion capture (e.g., optical Mocap, inertial IMU), motion measurement based on wearable devices (e.g., exoskeleton suits), and more abstract intent prediction based on electromyography/electroencephalography signals. Among them, the skeleton mapping method based on master-slave isomorphism is most widely used in dual-arm teleoperation for morphologically consistent humanoid robots. By constructing a skeleton topological mapping relationship between the human and the robot, combined with motion planning algorithms and redundancy constraint solvers, natural human actions can be high-fidelity reproduced on the robot execution end. Represented by the HERMES system proposed by MIT, researchers have further extended this structural mapping mechanism to the whole-body action dimension. In this system, the operator wears dual-arm motion capture equipment and a “balance feedback belt,” driving the robot’s motion in real-time with their own whole-body posture. The system converts the robot’s center-of-mass offset into force feedback applied to the human waist, allowing the operator to automatically adjust their posture based on instinctive reactions and control the robot to maintain balance through mapped linkage.

In recent years, adaptive learning-based human-robot mapping methods have also developed rapidly. For example, the TWIST framework proposed by Li et al., by pre-training posture tracking strategies in simulated environments and combining reinforcement learning for online fine-tuning, constructs a trajectory mapping network that can adapt to changes in human operation. Experiments show that this system can stably track and reconstruct complex upper limb postures with an average delay of 0.9 seconds; at the hardware level, Karthik Desingh et al. developed an economical, generalizable SPARK-Remote heterogeneous teleoperation platform. Based on a 1:2 scaled mechanism and low-cost force feedback devices, the researchers built a remote dual-arm control platform with a cost of only about $200 per arm, and significantly reduced the emergency stop (e-stop) incidence rate through force control compensators and tactile feedback mechanisms, making task performance close to offline operation levels.

Multimodal Perceptual Feedback

Multimodal perceptual feedback is key to enhancing操控 immersion and interaction efficiency. To provide operators with a “sense of presence” control experience, the system needs to synchronously feed back the robot’s status and environmental responses on multiple perceptual channels such as vision, force, and touch. Among them, visual feedback needs to combine environmental data collected by the robot, undergo processing such as 3D reconstruction and multi-viewpoint view generation, and finally achieve visualization of the remote operation scene on terminals such as head-mounted VR or displays. In more dimensional force and tactile feedback, the system typically provides real-time impedance responses to the operator through desktop, handheld force feedback devices, or wearable exoskeletons. Related research shows that appropriate force feedback not only improves operational precision and execution efficiency but also significantly reduces误操作 rates and operators’ muscle tension, especially crucial when facing dynamic targets, subtle operations, or invisible contact surfaces. By introducing real-time force-tactile feedback mechanisms on the master side, operators can more intuitively perceive the hardness, friction, and contact stability of target objects, thus forming a操控 experience similar to proprioception.

In recent research, the multimodal perceptual feedback环节 has gradually transformed from traditional visual-tactile fusion to a whole-body sensing and feedback system covering multidimensional signals such as vision, force, vibration, sound, posture, and physiology. Darvish et al. further defined this system as “whole-body bilateral,” emphasizing that not only should the operator see the remote environment and feel the end-effector forces, but also perceive their own posture and psychological load in real-time, and map this information back to the robot controller, thereby maintaining the robot’s dynamic stability and operational transparency under high-degree-of-freedom redundancy. In specific practice, Liu et al. incorporated the operator’s psychological state and physiological load into the teleoperation control loop, proposing a shared control method based on the teleoperator’s operational state. The system collects electroencephalogram (EEG) signals in real-time through wearable EEG devices, extracts five cognitive indicators (attention, arousal, frustration, fatigue, boredom), and maps these physiological information into a continuous operation state score (SoT) through a neural network. This score is used to dynamically adjust the human-robot control weight distribution in teleoperation: when the operator is in a focused and excited state, the system grants them higher control autonomy; when fatigue, frustration, or decreased attention is detected, the system automatically increases the proportion of autonomous control to ensure smooth task execution.

Master-Slave Collaborative Control Strategies

Control strategies are the core guarantee for the stable operation of teleoperation systems. Their main goal is to ensure operational transparency and stability while reducing the operator’s mental load and improving task execution efficiency. Due to information asymmetry, operational delays, and control precision differences between human operators and robots, the system must design a reasonable hierarchical master-slave control architecture to achieve effective connection from high-level intent parsing to low-level action execution. Typically, the upper layer of the control system is responsible for identifying operator intent, adjusting human-robot control weights, and planning strategy goals; while the lower layer executes specific operations such as end-effector trajectory generation and force-position hybrid control. In scenarios with high network latency or limited bandwidth, the system can adopt a shared control architecture, where high-level strategies in tasks are decided by humans, while local path optimization and action execution are completed autonomously by the robot. This control strategy retains human judgment while leveraging the robot’s computational advantages in local perception and control. On this basis, Hu et al. proposed a hybrid shared control framework based on intent recognition and confidence fusion to support complex dual-arm teleoperation suturing tasks. The system first uses a Transformer model to recognize surgical actions from the operator’s continuous action sequences, constructing high-level behavioral intent; then introduces a confidence predictor based on vision and force fusion, modeling the reliability of different channel data through Beta distribution, and uses this to calculate dynamic human-robot control weights λ(t). The robot end then fuses its own action goals with operator inputs proportionally based on this weight, achieving a smooth transition from fully manual to fully autonomous. This method significantly improved suturing success rates and操控 efficiency in user experiments and effectively reduced the operators’ subjective cognitive load.

On the other hand, shared control is no longer limited to the end-effector trajectory level but extends to multi-contact force distribution and whole-body posture coordination. When the dual-arm actions of a humanoid robot are strongly coupled with whole-body postures such as gait, waist twist, and torso tilt, dual-arm mapping often leads to center-of-mass offset and foot instability, seriously affecting task execution. Whole-body control (WBC) strategy is one feasible way to solve this problem. This strategy treats dual-arm operation as part of the whole-body coordinated control problem, completing dual-arm operation actions without affecting center-of-mass stability through multi-task optimization. WBC typically uses whole-body torques and ground contact forces as control variables, based on a task priority mechanism, divides goals such as center-of-mass projection, torso posture, and end-effector trajectory into different priority levels, and under the premise of satisfying high-priority constraints, optimizes low-priority tasks layer by layer, and calculates the optimal control quantity satisfying all constraints in real-time through a quadratic programming solver, synchronously allocating degrees of freedom for both arms and the lower body, achieving real-time coordination between operation and balance.

Comparison of Dual-Arm Teleoperation Key Technologies
Technology	Description	Challenges
Human-Robot Mapping	Translates human motion to robot commands	DOF matching, latency
Multimodal Feedback	Provides visual, force, tactile cues to operator	Immersion, data fusion
Shared Control	Dynamically allocates control between human and robot	Stability, intent recognition

The construction of dual-arm teleoperation systems encompasses multiple key technological links from human-robot mapping, multimodal perceptual feedback, to master-slave collaborative control. The system must not only ensure accurate mapping of operator actions but also achieve real-time perception and interactive feedback of the remote environment, and more importantly, achieve intent understanding, collaborative scheduling, and task planning at the cognitive level. Dual-arm teleoperation is a typical interdisciplinary problem. While theoretical research is conducted, it must also face practical application environments to solve multidimensional challenges such as time-delay compensation, dynamic stability, and human factors engineering. Therefore, dual-arm teleoperation not only demonstrates unique value in the engineering deployment of humanoid robots but also becomes a key technological path to promote humanoid robots toward advanced intelligent collaboration.

Conclusion and Outlook

With the continuous deepening of research work, humanoid robots are gradually leaving the laboratory and moving toward practical applications in various different scenarios. Compared to traditional industrial robots, humanoid robots demonstrate significant advantages in facing unstructured environments and diverse tasks due to their structural versatility and natural interaction, and dual-arm teleoperation technology has become an important link in their practical deployment. On one hand, teleoperation technology can fully integrate human judgment and operational capabilities into the robot system under the current background of immature autonomous intelligence, significantly improving task completion success rates, stability, and safety; on the other hand, the teleoperation process itself is becoming an important data source for robot strategy learning and model training, accelerating the promotion of humanoid robots toward a new stage of intelligent collaboration.

In the future, dual-arm teleoperation for humanoid robots still faces several key challenges. First, the fusion mechanism of autonomy and teleoperation is not yet mature. How to dynamically allocate human-robot control rights based on factors such as task complexity and risk level, and build a more adaptive shared control framework, is a key issue for achieving efficient collaboration. Second, multimodal feedback and human perception have not yet formed a standardized system. The operator’s visual, force, psychological state, and operational preferences still lack systematic integration, limiting the sense of presence and operational efficiency of teleoperation. Finally, from an engineering implementation perspective, the generality and deployment flexibility of teleoperation platforms still need improvement. Building low-cost, modular, and structurally universal teleoperation platforms will be the basic condition for promoting the large-scale落地 application of humanoid robot teleoperation technology. With the continuous突破 of related technologies, humanoid robots with dual-arm teleoperation capabilities are expected to play a role in broader practical application scenarios, gradually becoming an important technological form for replacing and extending human operational capabilities in complex environments.