Design and Intelligent Planning of a Space Bionic Soft Robot

In the realm of robotics, the pursuit of machines that can navigate and operate in unstructured environments has led to a significant shift towards bio-inspired designs. Traditional rigid-body robots, while effective in controlled settings, often struggle with limited degrees of freedom and poor adaptability to complex, dynamic surroundings. This is particularly evident in space applications, such as on-orbit servicing, where robots must handle unforeseen obstacles and perform delicate tasks. As a researcher in this field, I have focused on developing a novel bionic robot that overcomes these limitations by drawing inspiration from nature. The result is the GS-Bot, a flexible robot inspired by the skeletal structure of an inchworm and the muscular flexibility of a snake. This bionic robot leverages shape memory alloy (SMA) actuators to achieve high deformation capabilities and multi-degree-of-freedom motion, making it ideal for space missions. In this article, I will detail the design, implementation, and intelligent planning simulation of this bionic robot, emphasizing its potential for autonomous operation in challenging environments.

The concept of soft robotics has gained traction due to the ability of these machines to mimic biological organisms, offering enhanced flexibility and resilience. Unlike conventional robots, soft robots can bend, stretch, and twist, enabling them to conform to irregular surfaces and navigate tight spaces. This is crucial for space in-orbit services, where robots may encounter debris, confined modules, or delicate instruments. My work on the GS-Bot stems from the observation that many biological systems, such as inchworms and snakes, exhibit remarkable locomotion and manipulation skills through a combination of joint articulation and body curvature. By emulating these traits, I aimed to create a bionic robot that not only withstands external shocks but also performs complex deformations with precision. The use of SMA as an actuator material was a key decision, as it provides substantial force output and rapid response times, essential for real-time control in space scenarios.

The design of the GS-Bot revolves around a modular structure composed of interconnected bones and joints. Each joint consists of two skeletons made from photosensitive resin, connected by three SMA springs that serve as artificial muscles. The skeleton linkage is designed as a “ball-and-socket” joint, constrained by rubber tubes to limit the range of motion, mimicking the articulation seen in biological systems. This configuration allows for both twisting and bending motions, enabling the robot to achieve a wide array of poses. The SMA springs, made from nickel-titanium alloy, contract when heated and return to their original shape when cooled, providing the driving force for movement. To understand the behavior of these actuators, I conducted extensive testing on their thermal and mechanical properties. The results showed that SMA deformation is highly sensitive to voltage changes, with higher voltages leading to faster and more pronounced contractions. For instance, at 6 V, the SMA springs began significant deformation within 10 seconds, stabilizing after 20 seconds. This characteristic is critical for designing control systems that can precisely modulate the robot’s posture.

The system architecture of the GS-Bot is divided into three main components: the upper and lower computer controllers, the hardware circuitry, and the robot prototype. The lower computer, based on a micro control unit (MCU), communicates with the upper computer via serial communication to transmit control data. The hardware circuit delivers current to the SMA actuators based on timed electrical signals, enabling coordinated movement of the robot’s joints. The prototype itself features multiple segments, each capable of independent actuation, and is equipped with a miniature camera at its tip for visual feedback. This camera integrates the YOLO (You Only Look Once) object detection algorithm, allowing the robot to perceive its environment and identify targets. A visual control interface was developed to monitor and command the robot in real-time, providing a user-friendly platform for experimentation. Through this interface, operators can activate individual SMA springs, observe deformation, and access target detection data, such as coordinates and dimensions of objects in the camera’s field of view.

To validate the maneuverability of the GS-Bot, I performed physical experiments under controlled conditions. The robot’s base was fixed to a platform, and power was supplied to the SMA actuators. Over a period of 30 seconds, the robot underwent significant morphological changes, with most deformation occurring in the latter 10 seconds, consistent with the SMA’s response characteristics. The robot demonstrated an ability to twist and bend in multiple directions, showcasing its potential for navigating complex environments. The visual interface confirmed successful control, as seen in the real-time display of camera feeds and target detection outputs. For example, clicking the “Test SMA 1” button initiated contraction of the corresponding spring, causing the robot to bend towards a specified direction. This experiment not only proved the robot’s controllability but also highlighted the integration of sensory feedback for autonomous operation. The GS-Bot’s performance underscores the advantages of bionic robot designs in achieving high degrees of freedom and adaptability.

Building on the physical prototype, I developed a simulation model to explore intelligent planning and control strategies for the GS-Bot. Given the nonlinear dynamics of SMA-driven motion, I simplified the robot to a two-dimensional planar robotic arm model for initial studies. This model consists of two links representing the robot’s segments, with joint angles controlling the arm’s position and orientation. The base point is fixed at coordinates $O(x_0, y_0)$, and the links have lengths $l_1$ and $l_2$, with endpoints $E_1(x_1, y_1)$ and $E_2(x_2, y_2)$. The angles $\alpha$ and $\beta$ denote the rotations of the links relative to the horizontal axis, governing the arm’s posture. The coordinates are derived as follows:

$$
(x_1, y_1) = (l_1 \cos \alpha, l_1 \sin \alpha) + (x_0, y_0)
$$

$$
(x_2, y_2) = (l_2 \cos \beta, l_2 \sin \beta) + (x_1, y_1)
$$

In the simulation environment, a target object is placed at coordinates $G(x_G, y_G)$, and the goal is for the arm’s end-effector $E_2$ to reach this target autonomously. To achieve this, I employed a Q-learning algorithm, a reinforcement learning method that enables the robot to learn optimal actions through trial and error. The algorithm operates by defining states, actions, and rewards, with the robot acting as an agent that interacts with the environment. At time $t$, the agent takes an action $a_t$ based on the current state $s_t$, receives a reward $r_t$, and transitions to a new state $s_{t+1}$. The policy function $\pi(a_t | s_t)$ determines the probability of selecting action $a_t$ in state $s_t$, and the objective is to maximize the cumulative discounted reward over time.

The reward function $r$ is designed to guide the agent toward the target. If the end-effector $E_2$ enters the target region, it receives a reward of 1; if it remains there for 10 consecutive time steps, it earns a reward of 10, and the episode terminates. Otherwise, the reward is negative and inversely proportional to the Euclidean distance $d$ between $E_2$ and the target, expressed as $f(d)$. Mathematically, the reward function is defined as:

$$
r =
\begin{cases}
1, & \text{if } t_a < 10 \text{ and } (x_2, y_2) \in M \\
10, & \text{if } t_a \geq 10 \text{ and } (x_2, y_2) \in M \\
f(d), & \text{otherwise}
\end{cases}
$$

where $M$ represents the target region conditions:

$$
M = \left\{ (x_2, y_2) : x_G – \frac{G_w}{2} < x_2 < x_G + \frac{G_w}{2}, \ y_G – \frac{G_h}{2} < y_2 < y_G + \frac{G_h}{2} \right\}
$$

Here, $G_w$ and $G_h$ are the width and height of the target, and $t_a$ is the number of time steps the agent has spent in the environment. The episode termination condition $F$ is given by:

$$
F =
\begin{cases}
\text{True}, & \text{if } r = 10 \\
\text{False}, & \text{if } T’ = T’_m
\end{cases}
$$

where $T’_m$ is the maximum number of steps per episode. The action space consists of joint angle adjustments $a \in (-180^\circ, 180^\circ)$, and the learning rate $L$ controls the update rate of the angles:

$$
R_{t+1} \leftarrow R_t + L \cdot a_t
$$

This formulation allows the agent to explore the environment and gradually converge to an optimal policy. The Q-learning algorithm uses a neural network to approximate the action-value function $Q_\pi(s, a)$, which estimates the expected cumulative reward for taking action $a$ in state $s$. The network architecture includes hidden layers with units that process state information and output Q-values for each action. During training, the agent balances exploration and exploitation by using an $\epsilon$-greedy strategy, where it selects random actions with probability $\epsilon$ and greedy actions otherwise. Over multiple episodes, the agent learns to minimize the distance to the target and achieve stable performance.

To quantify the simulation parameters, I established a set of values that govern the training process. These parameters are summarized in the table below, which highlights key aspects such as the maximum episodes, steps, learning rate, and discount factor. This structured approach ensures reproducibility and allows for systematic optimization of the bionic robot‘s learning capabilities.

Parameter	Symbol	Value
Maximum Episodes	$E_m$	500
Maximum Steps per Episode	$T’_m$	2000
Discount Factor	$\gamma$	0.9
Learning Rate	$L$	0.01
Random Exploration Probability	$\epsilon$	0.95
Hidden Layer Units	$U$	1024

The simulation results demonstrated the effectiveness of the Q-learning approach for the GS-Bot model. Over the course of 500 episodes, the agent’s cumulative reward and step count per episode converged to stable values, indicating successful learning. As shown in the training curves, after approximately 100 episodes, the cumulative reward stabilized between 30 and 50, and the step count reduced to around 100-150 steps per episode. This signifies that the agent learned to reach the target efficiently, minimizing unnecessary movements. The convergence behavior is a testament to the robustness of the reward function and the algorithm’s ability to handle the nonlinear dynamics of the bionic robot. Furthermore, I investigated the impact of neural network architecture on training performance by varying the number of hidden units. Comparative trials with 128, 1024, and 2048 units revealed that 1024 units yielded the highest success rate, averaging 95.4% over five runs, while 128 units led to underfitting and 2048 units caused overfitting. This analysis underscores the importance of model tuning in reinforcement learning applications for soft robotics.

To provide a clearer comparison of the training outcomes, I have compiled the success rates for different hidden layer configurations in the table below. This data reinforces the notion that an appropriately sized network is crucial for balancing complexity and generalization in bionic robot control systems.

Hidden Units	Run 1	Run 2	Run 3	Run 4	Run 5	Average
128	89%	83.8%	89%	96%	92.6%	90.08%
1024	97.6%	97%	97.4%	90%	95%	95.4%
2048	92%	90%	89%	91%	95.6%	91.52%

The simulation also extended to a three-link model of the GS-Bot, illustrating the scalability of the approach. In this configuration, an additional segment was incorporated, increasing the degrees of freedom and enabling more complex maneuvers. The Q-learning algorithm successfully adapted to this expanded state space, with the agent learning to coordinate multiple joints to reach the target. This scalability is vital for real-world applications where bionic robots may require numerous segments for tasks such as grasping or traversing obstacles. The mathematical framework for the three-link model builds upon the two-link equations, with additional angles and coordinates. For instance, if a third link of length $l_3$ is added with angle $\gamma$, the end-effector coordinates become:

$$
(x_3, y_3) = (l_3 \cos \gamma, l_3 \sin \gamma) + (x_2, y_2)
$$

where $(x_2, y_2)$ is derived from the previous links. This hierarchical structure allows for seamless extension to n-link models, providing a foundation for advanced planning in high-dimensional spaces.

Beyond simulation, the integration of the GS-Bot’s physical and computational systems presents opportunities for real-time autonomous operation. The visual feedback from the camera, combined with the YOLO algorithm, enables the robot to detect and track objects in its environment. This sensory input can be fed into the Q-learning framework, allowing the robot to learn from actual interactions rather than simulated ones. For example, the reward function could be modified to incorporate visual cues, such as the proximity to a target identified by the camera. This fusion of perception and control is a key step toward deploying bionic robots in unstructured settings like space stations or planetary surfaces. Additionally, the use of SMA actuators offers energy efficiency and compactness, as they require only electrical heating for activation and can be embedded within the robot’s body. However, challenges remain, such as managing heat dissipation and ensuring precise temperature control for consistent deformation.

In terms of broader implications, the GS-Bot represents a paradigm shift in robotic design, emphasizing flexibility and bio-inspiration over rigid mechanics. The bionic robot concept aligns with trends in soft robotics, where materials science and machine learning converge to create adaptable machines. For space applications, this adaptability is crucial, as missions often involve unknown variables and limited human intervention. The ability of the GS-Bot to deform and twist allows it to navigate through tight passages, manipulate irregular objects, and withstand vibrations or impacts that might damage traditional robots. Moreover, the intelligent planning simulation demonstrates that reinforcement learning can effectively address the control complexities inherent in soft robots, paving the way for more autonomous systems. As I continue this research, future work will focus on enhancing the robot’s durability, improving SMA response times, and implementing more sophisticated learning algorithms, such as deep reinforcement learning, to handle multi-objective tasks.

In conclusion, the development of the GS-Bot bionic robot showcases the potential of nature-inspired designs for advancing robotics in challenging environments. Through a combination of innovative structural design, SMA actuation, and intelligent planning via Q-learning, this robot achieves high deformability and autonomous control. The physical experiments confirm its maneuverability, while the simulation results validate the efficacy of reinforcement learning for planning tasks. As the field of soft robotics evolves, bionic robots like the GS-Bot will play an increasingly important role in expanding the capabilities of machines beyond traditional limits. This work not only contributes to the technical knowledge base but also inspires further exploration into bio-mimetic systems that blur the line between artificial and natural intelligence.