The prevalence of luminal diseases necessitates advanced medical interventions. Embodied AI robots, particularly continuum or flexible robots, offer a promising solution due to their inherent compliance, allowing them to navigate the tortuous and delicate pathways of human lumens. However, this very flexibility presents a significant challenge: during intervention, the robot’s body can easily interact with and become obstructed by the lumen walls. Excessive interaction forces pose a risk of tissue damage or even perforation. Endowing these flexible robots with embodied intelligence—the capacity for self-perception, understanding of interaction states, and intelligent response—is therefore paramount for safe and effective operations. While much research in surgical robot perception focuses on external vision, true embodied intelligence requires deep awareness of the robot’s own morphological state. This article proposes a comprehensive framework for embodied morphological perception in flexible intervention robots, encompassing signal acquisition, processing, state understanding, and behavioral decision-making.

At the core of this embodied AI robot’s self-awareness is Fiber Bragg Grating (FBG) sensing. An FBG reflects a specific wavelength of light, which shifts proportionally to applied strain and temperature changes. For shape sensing, a multi-core optical fiber is used, with several FBG arrays inscribed along its length. Consider a fiber with one central core and three peripheral cores spaced 120° apart, each containing a series of FBG sensors. The strain measured at the $i$-th sensor node on the $j$-th core, $\epsilon_{j,i}$, relates to the local curvature $\kappa_i$, bending direction $\theta_i$, and axial strain $\epsilon_{a,i}$ as follows for the peripheral cores ($j=1,2,3$):
$$ \epsilon_{j,i} = \kappa_i r \cos(\theta_i – \frac{2\pi}{3}(j-1)) + \epsilon_{a,i} $$
where $r$ is the radial distance from the central axis to the peripheral cores. The wavelength shift $\Delta\lambda_{j,i}$ is linearly related to strain. By solving the system of equations from the three peripheral cores, the local curvature and direction can be derived, independent of the axial strain:
$$ \kappa_i = \frac{1}{k_{\epsilon} r} \sqrt{ \left( \frac{2\Delta\tilde{w}_{1,i} – \Delta\tilde{w}_{2,i} – \Delta\tilde{w}_{3,i}}{3} \right)^2 + \left( \frac{\Delta\tilde{w}_{2,i} – \Delta\tilde{w}_{3,i}}{\sqrt{3}} \right)^2 } $$
$$ \theta_i = \arctan\left( \sqrt{3} \cdot \frac{\Delta\tilde{w}_{2,i} – \Delta\tilde{w}_{3,i}}{2\Delta\tilde{w}_{1,i} – \Delta\tilde{w}_{2,i} – \Delta\tilde{w}_{3,i}} \right) $$
Here, $\Delta\tilde{w}_{j,i} = \Delta\lambda_{j,i} / \lambda_{j,i}$ and $k_{\epsilon}$ is the strain coefficient. This provides the fundamental “sense of touch” for the embodied AI robot, generating a discrete set of curvature-direction pairs $s_i = [\kappa_i, \theta_i]^T$ along its body.
Raw FBG signals are susceptible to noise and transient spikes, which can corrupt the shape estimation. For robust perception, the embodied AI robot must filter this sensory data. We propose an Improved Moving Average Filter (IMAF) that progressively incorporates the influence of neighboring arc states. For a window size $W = 2V + 1$ centered at node $i$, the filter operates through an $m$-step convergence process. Let $a_j^m$ be the intermediate state of node $j$ at convergence step $m$. The update rule is:
$$ a_j^{m+1} = \alpha a_j^m + \frac{1-\alpha}{2}(a_{j-1}^m + a_{j+1}^m) $$
with initial condition $a_j^1 = s_j$. The parameter $\alpha$ controls the influence of a node’s own state versus its neighbors. After $M$ convergence steps, the final filtered state $\mathbf{s}_i^{IMAF}$ for node $i$ is assigned based on its position: interior nodes take the converged value from the central index of the window, while endpoints use a weighted average with a factor $\beta$. This IMAF method effectively smooths noise while preserving genuine shape features critical for the embodied AI robot’s understanding of its configuration.
To reconstruct the complete 3D shape from the filtered discrete states $\mathbf{s}_i^{IMAF}$, the embodied AI robot employs a piecewise constant curvature assumption. The shape is divided into $N$ segments. The state for each segment $k$, $\mathbf{s}_k = [\kappa_k, \theta_k]^T$, is obtained via spline interpolation of the filtered node states. The transformation of the local frame from the base $\{O_k^b\}$ to the tip $\{O_k^e\}$ of a segment with length $l$ is given by:
$$ {}^b_e\mathbf{T}_k = \mathbf{T}_{rot}(z, \theta_k) \cdot \mathbf{T}_{trans}(z, \frac{\sin(l\kappa_k)}{\kappa_k}) \cdot \mathbf{T}_{trans}(x, \frac{1-\cos(l\kappa_k)}{\kappa_k}) \cdot \mathbf{T}_{rot}(y, l\kappa_k) \cdot \mathbf{T}_{rot}(z, -\theta_k) $$
For the straight segment case ($\kappa_k = 0$), the transformation simplifies to a pure translation: ${}^b_e\mathbf{T}_k = \mathbf{T}_{trans}(z, l)$. The position $\mathbf{p}_k$ of the end of the $k$-th segment in its base frame is the translation vector of ${}^b_e\mathbf{T}_k$. Concatenating these transformations for all segments $k=1$ to $N$ yields the full shape of the embodied AI robot’s body in the global coordinate system, providing it with a geometric model of itself in space.
Perceiving shape is only the first step; the embodied AI robot must understand what that shape implies about its interaction with the environment. We introduce a spatiotemporal method for blockage detection. As the robot advances, the position of each point $k$ on its body changes over time. The spatial deviation $e_k(t)$ at time $t$ over an interval $\Delta t$ is:
$$ e_k(t) = \| \mathbf{p}_k(t) – \mathbf{p}_k(t – \Delta t) \| $$
Under free motion, $e_k(t)$ is approximately equal to the insertion distance. When a blockage occurs at a point $o$, points proximal to the base ($k < o$) continue to move, while points at and distal to the blockage ($k \geq o$) experience restricted motion. Therefore, the blockage point $o$ is identified as:
$$ o = \min k \quad \text{s.t.} \quad e_k(t) < e_0 $$
where $e_0$ is a positive threshold. This allows the embodied AI robot to not only detect that an obstruction has occurred but also to localize it along its body, a crucial understanding for making informed decisions.
Armed with this understanding, the embodied AI robot must decide how to act. A simple, constant forward motion in the face of blockage can lead to dangerous force build-up. Inspired by the techniques of expert human operators who slightly retract and readvance tools to relieve friction, we propose an intermittent intervention strategy guided by a novel Instantaneous Global Inconsistency Index (IGII). The intervention velocity $v(t)$ is:
$$ v(t) = \begin{cases} v_f, & t \neq t_o \\ v_b(t), & t = t_o \end{cases} $$
where $v_f$ is a constant forward speed, $t_o$ is the time a blockage is detected, and $v_b(t)$ is a computed retraction speed. The IGII, denoted $C_{IGII}(t)$, is derived from the classical Global Inconsistency Index (GII) which quantifies the efficiency of force transmission along a flexible structure. For instantaneous assessment, we define it as a function of the curvatures up to the blockage point $o$:
$$ C_{IGII}(t) = \prod_{k=1}^{o} \cos(l \kappa_k(t)) $$
A higher $C_{IGII}$ (closer to 1) indicates a straighter, more efficient configuration. The retraction speed $v_b(t)$ is then designed to promote an increase in $C_{IGII}$, effectively steering the robot’s body toward a lower-energy, less buckled state during retraction. It is proportional to the sensitivity of the shape to changes in curvature modulated by the IGII gradient:
$$ v_b = k_{IGII} \sum_{k=1}^{o} \left( \frac{d \|\mathbf{p}_k\|}{d \kappa_k} \frac{d \kappa_k}{d C_{IGII}} \right) $$
where $k_{IGII}$ is a compliance factor. This strategy enables the embodied AI robot to autonomously and compliantly negotiate obstructed lumens.
The performance of the embodied morphological perception system was validated through simulation and physical experiment. Simulation compared the IMAF method against standard Moving Average (MA) and Median Filtering (MF). The IMAF demonstrated superior noise rejection and smoother output. The table below summarizes the Root Mean Square Error (RMSE) for curvature ($\kappa$) estimation under two different test shapes.
| Method | Shape 1: $\kappa$ RMSE ($\times 10^{-3}$ mm$^{-1}$) | Shape 2: $\kappa$ RMSE ($\times 10^{-3}$ mm$^{-1}$) |
|---|---|---|
| Moving Average (MA) | 10.3 ± 0.1 | 11.8 ± 0.14 |
| Median Filter (MF) | 11.7 ± 0.14 | 12.9 ± 0.17 |
| Improved MAF (IMAF) | 7.1 ± 0.05 | 7.2 ± 0.05 |
Physical experiments used a flexible robot equipped with a multi-core FBG fiber (18 sensing nodes) navigating templates with spiral and elliptical grooves. The IMAF-processed shape sensing achieved endpoint errors of 2.1 mm and 2.5 mm (1.2% and 1.5% of length) for the two shapes, confirming high accuracy. Blockage detection was successfully demonstrated in curved channels, with the algorithm correctly identifying the obstructed segment based on spatiotemporal deviation $e_k(t)$. Finally, the intermittent intervention strategy was tested against constant-speed insertion in narrowing lumen phantoms. The results, summarized below, show the embodied AI robot’s strategy significantly reduces interaction forces.
| Intervention Method | Lumen Width | Average Force (N) | Peak Force (N) |
|---|---|---|---|
| Constant Speed | 20 mm | 0.088 | 0.18 |
| Intermittent (Ours) | 20 mm | 0.076 | 0.12 |
| Constant Speed | 10 mm | 0.06 | 0.10 |
| Intermittent (Ours) | 10 mm | 0.034 | 0.06 |
In conclusion, this work presents a holistic embodied AI robot framework for safe lumen intervention. By integrating FBG-based shape sensing, advanced filtering (IMAF), spatiotemporal blockage understanding, and a bio-inspired intermittent control strategy, the flexible robot transitions from mere shape perception to actionable morphological cognition. This allows it to sense its own configuration, interpret interaction constraints with the environment, and execute compliant, intelligent navigation decisions. The developed methods mark a significant step toward autonomous and safer minimally invasive surgical robots capable of handling the uncertainties of real anatomical lumens. Future work will extend this embodied perception paradigm to actively steerable robots and more dynamic, deformable environments.
