Advanced Dynamic Tracking for Underwater Robotic End Effectors

In the rapidly advancing technological era, the field of underwater operations faces complex and highly challenging tasks. Underwater robots, as critical tools for executing these subsea missions, place an urgent demand on the precise tracking of their end effectors. The unique and harsh underwater environment, characterized by poor visibility and turbulent currents, presents significant obstacles. Traditional underwater working methods, such as direct operation by divers, not only carry a high risk factor but are also limited by human physiological constraints, restricting both the duration and depth of operations. This falls short of meeting the growing demands of deep-sea exploration, marine resource development, underwater rescue, and subsea infrastructure construction. Therefore, in-depth research into dynamic tracking methods for the position of an underwater robot’s end effector is of great significance for promoting the development of underwater operation technology, expanding human utilization of marine resources, and ensuring the safety of underwater work. The ability to know precisely where the end effector is at all times is fundamental to enabling complex manipulation tasks in these unpredictable environments.

Our work proposes a comprehensive method for the dynamic tracking of an underwater robotic end effector’s position. The core of our approach lies in a multi-stage pipeline designed for robustness and accuracy. The process begins with a hybrid positioning technique to obtain an initial fix on the end effector. This positional estimate is then refined over time using a sophisticated multi-target tracking filter operating across multiple frequency bands. Finally, the results from these parallel tracking channels are intelligently fused to produce a single, reliable, and dynamic trajectory for the end effector. This method addresses key shortcomings in existing techniques, particularly their susceptibility to environmental interference and data inconsistency.

1. Hybrid Online Positioning for the End Effector

The first and crucial step in our dynamic tracking framework is accurately determining the initial and subsequent positions of the underwater robot’s end effector. Relying on a single sensing modality is often insufficient in complex subsea environments. Therefore, we employ a hybrid positioning scheme that synergistically combines Time Difference of Arrival (TDOA) and Direction of Arrival (DOA) methods. This fusion leverages the complementary strengths of each technique to yield a more reliable location estimate than either could provide alone.

1.1 TDOA-Based Positioning Principle

We consider an underwater wireless sensor array network consisting of $ n $ sonar nodes. Let $ h_i $ denote the $ i $-th node with known spatial coordinates $ (x_i, y_i, z_i) $. The distance $ D_i $ from a target end effector located at an arbitrary point $ (x, y, z) $ to node $ h_i $ is given by the Euclidean distance:

$$D_i = \sqrt{(x – x_i)^2 + (y – y_i)^2 + (z – z_i)^2}.$$

All sensors operate in a passive listening mode. Using the time when node $ h_1 $ detects the acoustic signal from the end effector as a reference, the time delay difference $ \tau_{1i} $ between node $ h_i $ and the reference node $ h_1 $ can be estimated using generalized cross-correlation algorithms. This time difference corresponds to a range difference:

$$D_i = D_1 + \Delta D_{1i} = D_1 + c \cdot \tau_{1i},$$

where $ c $ is the speed of sound in water, $ D_1 $ is the unknown distance to the reference node, and $ \Delta D_{1i} $ is the measured range difference. This relationship leads to a system of nonlinear equations for $ i = 1, 2, …, n $:

$$
\begin{cases}
(x – x_1)^2 + (y – y_1)^2 + (z – z_1)^2 = D_1^2, \\
(x – x_2)^2 + (y – y_2)^2 + (z – z_2)^2 = (D_1 + \Delta D_{12})^2, \\
\vdots \\
(x – x_n)^2 + (y – y_n)^2 + (z – z_n)^2 = (D_1 + \Delta D_{1n})^2.
\end{cases}
$$

By linearizing this system (e.g., subtracting the first equation from the others), we can reformulate it into a matrix form suitable for least-squares estimation. The linearized form can be expressed as:

$$ \mathbf{R} \boldsymbol{\omega} = \mathbf{E}, $$

where $ \mathbf{R} $ is the coefficient matrix derived from sensor coordinates, $ \boldsymbol{\omega} $ is the state vector containing the target position $ (x, y, z) $ and $ D_1 $, and $ \mathbf{E} $ is the observation vector constructed from the range difference measurements $ \Delta D_{1i} $. Since measurement noise makes this an overdetermined and inconsistent system, we seek a solution in the least-squares sense:

$$\min \lVert \mathbf{R} \boldsymbol{\omega} – \mathbf{E} \rVert_2^2.$$

The least-squares solution is given by:

$$\hat{\boldsymbol{\omega}} = (\mathbf{R}^T \mathbf{R})^{-1} \mathbf{R}^T \mathbf{E}.$$

This provides a stable estimate for the end effector’s position even with multiple nodes.

1.2 DOA-Based Positioning Principle

Simultaneously, the distributed sensor nodes can estimate the direction of the incoming acoustic signal from the end effector. If node $ h_i $ measures the azimuth angle $ \alpha_i $ and the elevation angle $ \theta_i $ to the target, the geometric constraints are:

$$
\begin{cases}
\tan \alpha_i = \dfrac{y – y_i}{x – x_i}, \\[10pt]
\tan \theta_i = \dfrac{\sqrt{(x – x_i)^2 + (y – y_i)^2}}{z – z_i}.
\end{cases}
$$

These equations can also be rearranged into a linear form with respect to the target coordinates $ (x, y, z) $. After manipulation, we obtain a linear system:

$$ \mathbf{I} \mathbf{p} = \mathbf{N}, $$

where $ \mathbf{I} $ is the coefficient matrix depending on $ \alpha_i $ and $ \theta_i $, $ \mathbf{p} = [x, y, z]^T $, and $ \mathbf{N} $ is a vector of constants derived from sensor positions and angle measurements. The least-squares solution is:

$$\hat{\mathbf{p}} = (\mathbf{I}^T \mathbf{I})^{-1} \mathbf{I}^T \mathbf{N}.$$

1.3 Fused TDOA-DOA Positioning

The key innovation in our positioning stage is the fusion of the TDOA and DOA measurement models. While the state vectors $ \boldsymbol{\omega} $ and $ \mathbf{p} $ differ, we can transform the DOA equations to be compatible with the TDOA state representation. After appropriate transformation, we combine the two sets of linearized equations into a single, augmented system:

$$
\begin{bmatrix}
\mathbf{R} \\
\tilde{\mathbf{I}}
\end{bmatrix}
\boldsymbol{\omega} =
\begin{bmatrix}
\mathbf{E} \\
\tilde{\mathbf{N}}
\end{bmatrix},
$$

where $ \tilde{\mathbf{I}} $ and $ \tilde{\mathbf{N}} $ are the transformed DOA coefficient matrix and observation vector, respectively. The fused least-squares solution, which gives the estimated position $ (x, y, z) $ of the end effector, is:

$$\hat{\boldsymbol{\omega}}_{fused} = \left( \mathbf{R}^T\mathbf{R} + \tilde{\mathbf{I}}^T\tilde{\mathbf{I}} \right)^{-1} \left( \mathbf{R}^T\mathbf{E} + \tilde{\mathbf{I}}^T\tilde{\mathbf{N}} \right).$$

This hybrid approach significantly improves positioning robustness. TDOA provides good accuracy in range, while DOA provides strong angular information. Their combination mitigates the weaknesses of each method when used in isolation (e.g., TDOA’s sensitivity to synchronization errors or DOA’s ambiguity with distant targets), leading to a more accurate initial fix for the end effector’s location. The performance characteristics of the individual and fused methods are summarized below.

Positioning Method	Primary Measurement	Key Strength	Potential Weakness	Role in End Effector Tracking
TDOA	Time Delay Differences	High range-difference accuracy in clear multipath conditions	Requires precise synchronization; degraded by severe multipath	Provides relative distance constraints between the end effector and sensor array.
DOA	Angles (Azimuth & Elevation)	Good directional information; less reliant on absolute timing	Accuracy decreases at low SNR or with small aperture arrays	Provides absolute bearing lines pointing to the end effector.
Fused TDOA-DOA (Proposed)	Both Time Delays and Angles	Improved robustness and accuracy; mitigates individual weaknesses	Increased computational complexity	Delivers a more reliable and accurate 3D position estimate for initializing and aiding the end effector tracker.

2. Dynamic Tracking via Multi-Band Filtering and Fusion

The hybrid positioning module provides discrete position estimates. To achieve smooth, continuous, and predictive dynamic tracking of the end effector’s trajectory, we employ a sophisticated filtering and fusion framework. This framework is designed to handle the uncertainty in measurements, the potential for multiple reflecting paths (clutter), and the need to leverage the broadband nature of acoustic signals.

2.1 State and Measurement Models for the End Effector

We model the kinematic state of the end effector at time $ t $. For tracking purposes, a common approach is to track both position and velocity in the bearing coordinate. Let the state vector be $ \mathbf{x}_t = [\beta_t, \dot{\beta}_t]^T $, where $ \beta_t $ is the bearing (or a relevant positional component) and $ \dot{\beta}_t $ is its rate of change. The motion of the end effector is assumed to follow a constant velocity model perturbed by Gaussian noise:

$$\mathbf{x}_t = \mathbf{F}_{t-1} \mathbf{x}_{t-1} + \mathbf{w}_{t-1},$$

where $ \mathbf{F}_{t-1} $ is the state transition matrix and $ \mathbf{w}_{t-1} $ is zero-mean Gaussian process noise with covariance $ \mathbf{Q}_{t-1} $. For a sampling interval $ T $, these are typically:

$$\mathbf{F}_{t-1} = \begin{bmatrix} 1 & T \\ 0 & 1 \end{bmatrix}, \quad \mathbf{Q}_{t-1} = \sigma_q^2 \begin{bmatrix} T^4/4 & T^3/2 \\ T^3/2 & T^2 \end{bmatrix},$$

where $ \sigma_q^2 $ governs the intensity of the assumed maneuver noise.

The measurement from our hybrid positioning system is the estimated bearing or a direct positional component. Therefore, the measurement model is linear:

$$z_t = \mathbf{H}_t \mathbf{x}_t + v_t,$$

where $ \mathbf{H}_t = [1, 0] $ for bearing measurement, and $ v_t $ is zero-mean Gaussian measurement noise with variance $ \sigma_r^2 $.

2.2 Improved Gaussian Mixture PHD Filter for Multi-Band Tracking

To track the end effector robustly in the presence of clutter and missed detections, we utilize a multi-target tracking filter, specifically a Gaussian Mixture Probability Hypothesis Density (GM-PHD) filter. The PHD filter propagates the first-order statistical moment (the intensity function) of the multi-target state, which is efficient and effective. The standard GM-PHD filter, however, can suffer from performance degradation when new targets (or in our case, new significant signal paths corresponding to the end effector) appear spontaneously.

Our key improvement is the integration of a density-based clustering algorithm (like DBSCAN) into the birth process of the GM-PHD filter. The acoustic signal from the end effector is decomposed into $ Q $ frequency sub-bands. Each sub-band’s positioning module (using the fused TDOA-DOA) outputs a set of position measurements at time $ t $, denoted $ Z_{t,q} $ for sub-band $ q $. Some of these are true measurements of the end effector, while others may be noise or clutter.

Prediction Step: Suppose at time $ t-1 $, the posterior intensity function $ v_{t-1}(\mathbf{x}) $, representing the estimated states of potential targets (the main end effector path and persistent multipath), is a Gaussian mixture:

$$v_{t-1}(\mathbf{x}) = \sum_{j=1}^{J_{t-1}} w_{t-1}^{(j)} \mathcal{N}(\mathbf{x}; \mathbf{m}_{t-1}^{(j)}, \mathbf{P}_{t-1}^{(j)}).$$

The predicted intensity for surviving targets is also a Gaussian mixture:

$$v_{t|t-1}^{surv}(\mathbf{x}) = \sum_{j=1}^{J_{t-1}} w_{t|t-1}^{(j)} \mathcal{N}(\mathbf{x}; \mathbf{m}_{t|t-1}^{(j)}, \mathbf{P}_{t|t-1}^{(j)}),$$

where $ w_{t|t-1}^{(j)} = p_{S} w_{t-1}^{(j)} $, $ \mathbf{m}_{t|t-1}^{(j)} = \mathbf{F}_{t-1} \mathbf{m}_{t-1}^{(j)} $, and $ \mathbf{P}_{t|t-1}^{(j)} = \mathbf{F}_{t-1} \mathbf{P}_{t-1}^{(j)} \mathbf{F}_{t-1}^T + \mathbf{Q}_{t-1} $, with $ p_S $ being the survival probability.

Adaptive Birth Intensity: Instead of using a fixed birth intensity model, we dynamically estimate it from the measurements $ Z_{t,q} $. The clustering algorithm is applied to the union of measurements from all sub-bands. Measurements that cluster together are likely to originate from a new, persistent source (like a strong new multipath component or a re-acquired end effector). Each cluster $ k $ generates a birth Gaussian component with mean $ \mathbf{m}_{\gamma,t}^{(k)} $ (derived from the cluster centroid), a predefined covariance $ \mathbf{P}_{\gamma} $, and a weight $ w_{\gamma,t}^{(k)} $ proportional to the cluster size. The total birth intensity is:

$$\gamma_t(\mathbf{x}) = \sum_{k=1}^{K_\gamma} w_{\gamma,t}^{(k)} \mathcal{N}(\mathbf{x}; \mathbf{m}_{\gamma,t}^{(k)}, \mathbf{P}_{\gamma}).$$

This adaptive mechanism allows the tracker to spontaneously initiate tracks for the end effector or its significant signal reflections without prior knowledge, greatly enhancing robustness in dynamic acoustic environments.

Update Step: The predicted intensity $ v_{t|t-1}(\mathbf{x}) = v_{t|t-1}^{surv}(\mathbf{x}) + \gamma_t(\mathbf{x}) $ is then updated with the measurement set $ Z_{t,q} $ for each sub-band $ q $ independently. The standard PHD corrector equation is applied, resulting in an updated Gaussian mixture $ v_{t,q}(\mathbf{x}) $ for each sub-band. Each component in $ v_{t,q}(\mathbf{x}) $ carries a unique label, allowing us to associate estimates of the same physical end effector across different frequency sub-bands $ q $.

2.3 Fusion of Sub-Band Tracks via Generalized Covariance Intersection

At this stage, we have $ Q $ separate but correlated estimates (Gaussian mixtures) for the state of the end effector, one from each frequency sub-band. Simply taking the average is suboptimal because the estimation errors across sub-bands are correlated (e.g., due to common environmental disturbances). We employ the Generalized Covariance Intersection (GCI) criterion, also known as Chernoff fusion or exponential mixture density fusion, to optimally fuse these estimates.

Let $ v_{t,q}(\mathbf{x}) $ be the posterior intensity (a Gaussian mixture) from sub-band $ q $. The fused global intensity $ v_t^{fused}(\mathbf{x}) $ is obtained by taking the weighted geometric mean:

$$v_t^{fused}(\mathbf{x}) \propto \prod_{q=1}^{Q} \left[ v_{t,q}(\mathbf{x}) \right]^{\omega_q},$$

where $ \omega_q $ are fusion weights satisfying $ \sum_{q=1}^{Q} \omega_q = 1 $. These weights can be set based on the perceived reliability of each sub-band (e.g., inverse of estimated error variance). For Gaussian mixtures, this fusion rule can be approximated efficiently. The main effect is that the fused estimate maintains consistency (its covariance is never over-optimistic) even when the cross-correlations between sub-band errors are unknown.

After fusion, the components of $ v_t^{fused}(\mathbf{x}) $ with weights above a predefined threshold $ T_{prune} $ are extracted. The mean $ \mathbf{m}_t^{(j)} $ of each such significant component, along with its associated track label, constitutes the final dynamic tracking output for the end effector at time $ t $:

$$\mathcal{X}_t^{end\,effector} = \{ \mathbf{m}_t^{(j)}, \ell_t^{(j)} \, | \, w_t^{(j)} > T_{prune} \}.$$

This multi-band tracking and fusion architecture provides significant advantages. Different frequency sub-bands experience varying degrees of attenuation, noise, and multipath. By tracking independently in each band and fusing, our method effectively diversifies against band-specific interference, leading to a smoother and more accurate overall trajectory for the end effector.

Processing Stage	Core Algorithm/Technique	Input	Output	Benefit for End Effector Tracking
Per-Sub-Band Tracking	Improved GM-PHD Filter with Adaptive Birth	Position estimates from a single frequency sub-band.	A Gaussian mixture representing possible end effector states (position, velocity) in that band.	Robustly handles clutter and missed detections within a specific band; adapts to new signal paths.
Cross-Sub-Band Fusion	Generalized Covariance Intersection (GCI)	Gaussian mixtures from all Q sub-band trackers.	A single, fused Gaussian mixture intensity function.	Optimally combines information from all bands while accounting for unknown correlations, improving estimate consistency and accuracy.
State Extraction	Thresholding & Component Selection	Fused Gaussian mixture intensity.	Final estimated state(s) and trajectory point(s) for the end effector.	Provides a clear, discrete output of the most probable end effector location and motion state.

3. Experimental Validation and Performance Analysis

To validate the effectiveness of our proposed dynamic tracking method for the underwater robotic end effector, we conducted a series of controlled experiments and numerical simulations. The performance was evaluated against established baseline methods.

3.1 Experimental Setup and Parameters

A water tank environment measuring 300m x 300m was utilized. The water conditions were maintained at a temperature of 20±2°C with a turbidity of 5.0 NTU. Three simulated current generators were deployed to create variable flow patterns, mimicking realistic underwater disturbances. The sensor network comprised 8 hydrophone nodes with precisely calibrated positions. The end effector was mounted on a 6-degree-of-freedom robotic arm submerged in the tank, executing predefined trajectories. Key algorithmic parameters were set as follows: number of frequency sub-bands $ Q = 10 $, sampling frequency $ f_s = 100 $ Hz, GM-PHD survival probability $ p_S = 0.99 $, GCI fusion weights $ \omega_q = 1/Q $ (equal weighting), and pruning threshold $ T_{prune} = 10^{-5} $.

3.2 Positioning Accuracy Comparison

We first evaluated the core hybrid positioning module. Five distinct static positions of the end effector were selected as ground truth targets. Our proposed fused TDOA-DOA method was compared against two contemporary methods: an Improved Grey Wolf Optimizer (IGWO) based method and a Mono-Stereo Switching method. The positioning error was calculated as the Euclidean distance between the estimated and true end effector position. The results are summarized below.

End Effector Target ID	Ground Truth Position (m)	Positioning Error (m)
	(x, y, z)	Proposed Fused Method	IGWO Method	Mono-Stereo Switch Method
1	(50.0, 102.0, -10.0)	0.12	2.85	4.12
2	(170.0, 234.0, -15.5)	0.08	0.31	1.05
3	(215.0, 236.0, -12.0)	0.15	1.97	5.01
4	(289.0, 245.0, -8.0)	0.11	1.23	0.96
5	(47.0, 251.0, -20.0)	0.09	1.55	0.11
Average Error		0.11	1.58	2.25

The results clearly demonstrate the superior accuracy of our fused TDOA-DOA approach. By combining complementary information, it consistently achieves sub-decimeter level accuracy, significantly outperforming the other methods which show meter-level errors and high variability. This high-precision positioning forms a reliable foundation for the dynamic tracking stage.

3.3 Dynamic Tracking Performance

We then assessed the complete dynamic tracking pipeline. The end effector was commanded to follow a complex 3D trajectory involving linear motions, curves, and pauses, all while simulated currents were active. We compared the tracking output of our method against two recent tracking-focused approaches: a DFI-Integrated Network method and a Laguerre Function-based adaptive control method. The tracking performance was quantified using the Root Mean Square Error (RMSE) between the estimated and ground truth trajectory over the entire run, and the Track Fragmentation Count (TFC), which measures how often the track on the end effector is lost and re-initiated.

Tracking Method	Average RMSE (m)	Maximum RMSE (m)	Track Fragmentation Count	Qualitative Stability
Proposed Multi-Band GCI Method	0.18	0.45	2	Very Stable, smooth trajectory.
DFI-Integrated Network Method	0.82	2.15	12	Unstable, frequent jumps and drifts.
Laguerre Function-Based Method	0.51	1.38	7	Moderately stable, gradual deviation over time.

Our method shows a dramatic improvement in tracking accuracy and continuity. The low RMSE and minimal fragmentation indicate that the multi-band GM-PHD filter, with its adaptive birth model, effectively maintains track on the end effector through periods of increased noise or intermittent signal blockage. The GCI fusion ensures the trajectory is smooth and consistent. In contrast, the DFI method, while potentially powerful in clear conditions, proves highly sensitive to the dynamic interference, leading to a fragmented and inaccurate track. The Laguerre-based method performs better but still accumulates error and suffers from more frequent track loss.

3.4 Robustness Analysis: Tracking Curve Fluctuation Entropy

To further evaluate robustness, we introduced the metric of Tracking Curve Fluctuation Entropy (TCFE). This metric calculates the information entropy of the sequence of differences between consecutive estimated positions. A lower entropy value indicates a smoother, more predictable, and less erratic tracking output, signifying better filtering of spurious noise and interference. We computed the TCFE for segments of the trajectory with varying levels of induced current disturbance (Low, Medium, High).

Disturbance Level	Current Speed Range (m/s)	Tracking Curve Fluctuation Entropy
		Proposed Method	DFI Method	Laguerre Method
Low	0.05 – 0.15	0.05	0.21	0.12
Medium	0.16 – 0.30	0.07	0.48	0.23
High	0.31 – 0.50	0.09	0.72	0.41

The results are striking. Our method maintains a very low TCFE (≤ 0.09) across all disturbance levels, demonstrating exceptional smoothness and stability. The entropy values for the comparison methods increase significantly with disturbance, with the DFI method’s output becoming highly disordered (entropy > 0.7) under strong currents. This quantitatively proves that our framework’s multi-band diversity and robust fusion mechanism effectively suppress the impact of environmental fluctuations on the end effector’s tracked path.

4. Conclusion

This work has presented a novel and comprehensive method for the dynamic tracking of an underwater robotic end effector’s position. The method is built upon three interconnected pillars: a robust hybrid TDOA-DOA positioning technique for accurate initial location, an improved GM-PHD filter with adaptive birth for robust multi-band tracking in clutter, and a Generalized Covariance Intersection fusion strategy for optimal information integration across frequency bands. Extensive experimental validation confirms the superiority of the proposed approach. It achieves significantly higher positioning accuracy, lower dynamic tracking error, and superior robustness against environmental interference compared to existing state-of-the-art methods, as evidenced by metrics like RMSE, track fragmentation, and fluctuation entropy. The ability to reliably and accurately track the end effector in real-time is a critical enabler for advanced underwater manipulation, inspection, and intervention tasks, paving the way for more autonomous and effective underwater robotic systems.