Pose Estimation for Claw-Type End Effectors Using Tactile Force Information

The fundamental and widespread application of robotic technology lies in manipulation tasks performed by robotic arms. A core function within this domain is the precise grasping and capturing of targets using an end effector. However, due to factors such as inherent manipulator performance limitations, calibration errors, and the demands of multi-task operations, a robotic arm cannot guarantee perfectly accurate, zero-deviation capture actions. Therefore, end effectors must possess a certain degree of compliance and tolerance. Furthermore, it cannot be assumed that the end effector will be in the exact same initial pose for every capture attempt. Consequently, the online estimation of a target object’s pose relative to the end effector is not merely beneficial but often necessary for successful, robust operation. Research into pose estimation methodologies can be broadly categorized into non-contact and contact-based approaches.

Non-contact estimation occurs before physical interaction between the end effector and the target. It typically relies on sensors like vision systems or laser rangefinders, combined with known geometric features of the target, to infer its pose. While effective in controlled environments, these methods can increase system complexity, require additional hardware, and may be susceptible to lighting conditions or visual obstructions. In contrast, contact-based estimation utilizes information generated at the moment of physical interaction, such as tactile data, contact forces, and positional feedback. This approach is particularly advantageous when the robotic system is already equipped with a wrist-mounted force/torque sensor, as it adds no further hardware complexity. The estimated information is not limited to the relative pose; it can also include contact state, local surface features, and other parameters, which, when combined with prior knowledge of the object’s geometry, enable full pose estimation.

This article focuses on a contact-based pose estimation method developed specifically for a simultaneous three-fingered end effector. The proposed method leverages contact force information measured during the capture process. After analyzing the unique geometric constraint features of this end effector design, we detail a sequential estimation pipeline. This includes identifying the contacting finger, estimating the contact point location and surface normal, and finally, computing the target’s full pose. We present two core algorithms: one for initial estimation using the minimal information from two distinct contacts, and another for refining the estimate by optimally integrating redundant contact data from multiple interactions. The effectiveness of the method is demonstrated through simulation, confirming its utility for this class of claw-type end effectors and highlighting its general applicability to similar designs.

1. The Simultaneous Three-Fingered End Effector

1.1 Mechanical System Description

The subject of this study is a specialized simultaneous three-fingered end effector designed for robust capture tasks. As depicted in the figure, this end effector features three capture fingers arranged symmetrically around its central axis. A key characteristic is their synchronized motion, driven by a single actuator through an internal mechanism. This design ensures a consistent and enveloping grasp. The target interface, or “grapple fixture,” possesses a wedged, triple-lobed geometry that actively guides the fingers into the central capture area during misaligned approaches, enhancing the system’s tolerance to initial pose errors.

Through kinematic analysis of the mechanism, the position of any fingertip $j$ can be expressed within the end effector’s coordinate frame $O X_e Y_e Z_e$ as a function of the input actuation angle $\phi$:

$$ {^eT_j} = \begin{bmatrix} {^eT_{jx}} \\ {^eT_{jy}} \\ {^eT_{jz}} \end{bmatrix} = \begin{bmatrix} f^e_{T_{jx}}(\phi) \\ f^e_{T_{jy}}(\phi) \\ f^e_{T_{jz}}(\phi) \end{bmatrix}, \quad (j = \text{I, II, III}) $$

This functional relationship is crucial for later stages of the estimation process, allowing us to map the actuator’s state to the 3D position of the fingertips.

1.2 Task Assumptions and Problem Scope

To formulate a tractable estimation problem amidst the complexities of a dynamic capture sequence, we make the following simplifying assumptions:

During collision, only single-point contact occurs at any given time.
The effects of friction at the point of contact are neglected.
Non-linear factors in the end effector’s drive system, such as backlash and joint flexibility, are not considered.
The analysis focuses on the initial “enveloping” phase where significant pose偏差 (deviation) exists.
The grapple fixture is assumed to be approximately upright during the operation.

These assumptions allow us to concentrate on the geometric essence of the contact interaction, where the contact force vector is aligned with the surface normal at the point of contact.

2. Core Pose Estimation Methodology

2.1 Contact Finger Identification

The first step in the estimation pipeline is to determine which of the three fingers is in contact with the target. The grapple fixture’s lobed design creates distinct planar constraint surfaces. When contact occurs, it is always between a fingertip and one of these predefined constraint surfaces. Under the frictionless assumption, the reaction force $F$ exerted by the fixture on the end effector is perpendicular to the contacted surface.

By projecting the 3D fixture geometry onto a plane, the angular position of each constraint surface is known. The direction of the contact force vector, measured in the end effector’s frame ($^eF$), can be computed. Given the relatively high precision of the manipulator’s wrist control, the orientation misalignment between the end effector and the fixture is small. Therefore, the angle $\theta = \arctan(^eF_y / ^eF_z)$ calculated from the measured force reliably indicates the specific constraint surface involved, which directly maps to the contacting finger. The logic for this mapping is summarized below:

Contacted Constraint Surface	Force Vector Angle $\theta$ (deg)	Identified Finger
Surface a	50 or 130	Finger I
Surface b	-70 or 170	Finger II
Surface c	10 or -110	Finger III

2.2 Contact Point and Normal Vector Estimation

Once the contacting finger $j$ is identified, its center point $T_j$ in the end effector frame is known from the kinematic model $^eT_j(\phi)$. The line collinear with the contact force $F$ and passing through the contact point $C$ is known as the external force vector line. Given the spherical fingertip geometry and the frictionless assumption, the contact force vector must pass through both the contact point $C$ and the sphere’s center (the fingertip center $T_j$). Therefore, the contact point location can be estimated by moving from the fingertip center opposite to the force direction by the sphere’s radius $r$:

$$ ^eC = {^eT_j} – r \frac{^eF}{\|^eF\|} $$

Furthermore, the unit normal vector $n$ of the contacted constraint surface is aligned with the force direction. Thus, it is estimated in the end effector frame as:

$$ ^en = \frac{^eF}{\|^eF\|} $$

These quantities, $^eC$ and $^en$, are then transformed into the world or base coordinate frame $O X_o Y_o Z_o$ to yield $^oC$ and $^on$, which are used for the subsequent global pose estimation.

2.3 Pose Estimation from Minimal Contact Information

A full pose estimate becomes possible after at least two distinct contacts on different constraint surfaces. Consider two such contacts, indexed 1 and 2, providing estimates $\{ ^on_1, ^oC_1 \}$ and $\{ ^on_2, ^oC_2 \}$.

Each constraint surface, when translated inward by its known perpendicular distance $d_c$ to the central axis of the grapple fixture, defines a plane that contains this axis. The equation of this axis plane derived from the first contact is:

$$ ^on_{1x} x + ^on_{1y} y + ^on_{1z} z + p_1 + d_c = 0 $$
where $ p_1 = -( ^on_{1x} {^oC_{1x}} + ^on_{1y} {^oC_{1y}} + ^on_{1z} {^oC_{1z}} ) $.

Similarly, from the second contact we get:
$$ ^on_{2x} x + ^on_{2y} y + ^on_{2z} z + p_2 + d_c = 0 $$
where $ p_2 = -( ^on_{2x} {^oC_{2x}} + ^on_{2y} {^oC_{2y}} + ^on_{2z} {^oC_{2z}} ) $.

Since $ ^on_1 \times ^on_2 \neq 0 $ for contacts on different surfaces, these two planes intersect in a line, which is precisely the estimated central axis $s$ of the grapple fixture. The direction of this line is given by:
$$ ^os = \frac{^on_2 \times ^on_1}{\| ^on_2 \times ^on_1 \|} $$
(The order ensures the correct outward direction).

The origin $I$ of the grapple fixture’s coordinate frame $O X_i Y_i Z_i$ lies along this central axis within a known zone relative to the fingertips. Its $Z_o$-coordinate can be approximated. Substituting this into the two plane equations allows solving for the $X_o$ and $Y_o$ coordinates of point $I$, fully defining the axis line.

To determine the full orientation, we construct an intermediate frame from the first contact. Its $Z$-axis is $s$, and its $X$-axis is $^on_1$. This frame is related to the actual grapple frame $O X_i Y_i Z_i$ by a known rotation $-\theta_a$ around $Z$. This provides one orientation estimate $^oR_{ia}$. A second estimate $^oR_{ic}$ comes from the second contact. The final orientation is computed by intelligently fusing these two estimates, primarily by averaging the $Y$-axis vectors. The final estimated pose $P_{est}$ is a 6-dimensional vector containing the position $^oI$ and the orientation expressed in RPY angles $(\psi, \theta, \phi)$:
$$ P_{est} = [^oI_x, \, ^oI_y, \, ^oI_z, \, \psi, \, \theta, \, \phi]^T $$

3. Estimation Update with Redundant Contact Information

During the capture process, more than two contacts ($N \geq 3$) often occur. This redundancy provides an opportunity to refine the pose estimate, making it more robust to sensor noise and kinematic model inaccuracies.

3.1 Central Axis Estimation Refinement

Ideally, all $N$ translated axis planes $c’_k$ should intersect along a single line (the central axis). In practice, they will not. We formulate the axis estimation as an optimization problem: find two points $A_1$ and $A_2$ on the axis within the fixture’s effective length that minimize the sum of squared distances to all $N$ planes.

The distance from a candidate point $A$ to a plane defined by normal $^on_k$ is:
$$ D_{Ak} = | \, ^on_{kx} A_x + ^on_{ky} A_y + ^on_{kz} A_z + p_k + d_c \, | $$
where $ p_k = -( ^on_{kx} {^oC_{kx}} + ^on_{ky} {^oC_{ky}} + ^on_{kz} {^oC_{kz}} ) $.

Fixing the $Z$ coordinates of $A_1$ and $A_2$ based on the fixture’s geometry, we solve for their $X$ and $Y$ coordinates by minimizing the sum of squared distances $J = \sum_{k=1}^{N} D_{Ak}^2$ for each point. This is a standard least-squares problem. The updated axis direction vector $s’$ is then $ \overrightarrow{A_2A_1} / \| \overrightarrow{A_2A_1} \| $, and the frame origin $I$ is updated to coincide with $A_1$.

3.2 Comprehensive Pose Update

Each contact $k$ provides an estimate for the grapple fixture’s orientation. We create a local frame at the projected point $C’_k$ (the contact point shifted by $-d_c ^on_k$) with its $Z$-axis aligned with the updated axis $s’$ and its $Y$-axis aligned with the vector from the axis footpoint to $C’_k$. This local frame is related to the grapple frame by a known rotation $\varphi_k$ about $Z$. This yields $N$ orientation estimates $^oR_{ik}$.

All these estimates share the same $Z$-axis ($s’$). The final, refined $Y$-axis of the grapple frame is computed as the normalized average of all $N$ estimated $Y$-axis vectors:
$$ ^oy_i = \frac{\sum_{k=1}^{N} {^oy_{ik}} }{\| \sum_{k=1}^{N} {^oy_{ik}} \| } $$
The $X$-axis is then given by $^ox_i = {^oy_i} \times {^os’}$. This process optimally combines all available tactile information into a single, accurate pose estimate of the target relative to the end effector.

4. Simulation and Validation

4.1 Simulation Setup

A simulation environment was constructed to validate the estimation framework. The end effector was modeled with its precise kinematics, and the grapple fixture was placed with a defined initial pose偏差. The simulation protocol mimicked a realistic capture sequence: the end effector starts with fingers retracted, moves until a first contact is detected, then re-plans its motion to induce a second, distinct contact. After the second contact, a gentle grasping motion is simulated to potentially generate a third or fourth contact, providing redundant data. The estimation algorithms were applied after each contact event.

4.2 Simulation Results and Analysis

Pose error was quantified using two metrics: the Cartesian distance $\Delta d$ between the estimated and true fixture origin, and the rotation angle $\Delta \Theta$ of the relative orientation error. Multiple simulation runs were conducted under different initial conditions: positional error only, orientation error only, and combined errors. The termination condition for the iterative update process was defined as changes in successive estimates falling below thresholds ($\Delta d_{th} = 0.3$ mm, $\Delta \Theta_{th} = 0.3^\circ$) or reaching a maximum number of contacts (e.g., 10).

A representative subset of results is shown in the table below, comparing the estimate from the minimal two-contact method with the final refined estimate after processing all redundant contacts.

Case	Initial Error (d/Θ)	Two-Contact Estimate Error ($\Delta d$ / $\Delta \Theta$)	Redundant-Update Estimate Error ($\Delta d$ / $\Delta \Theta$)
1: Position Only	20 mm / 0°	3.83 mm / 2.89°	1.37 mm / 1.09°
2: Orientation Only	0 mm / 6°	1.16 mm / 3.62°	0.21 mm / 1.21°
3: Combined	20 mm / 5°	1.05 mm / 3.21°	1.95 mm / 1.82°

The results consistently show that the two-contact method provides a reasonable but imperfect initial estimate. The subsequent integration of redundant contact information significantly reduces both positional and rotational errors. A typical error convergence plot for a combined error initial condition ($P_i = [5.5, 10, -15]$ mm, $[2, 4, 3]$ deg) shows error dropping sharply after the second contact and then asymptotically converging as more contacts are processed. The residual error is typically small enough to be handled by a subsequent compliant control phase, such as impedance control, during the final alignment and locking sequence of the end effector.

5. Conclusion

This article presented a novel, contact-based pose estimation framework for a simultaneous three-fingered end effector. The method exploits geometric constraints and force feedback without requiring additional exteroceptive sensors. The core pipeline involves identifying the contact finger, estimating the local contact geometry, and then solving for the global pose using intersecting plane geometry. A significant contribution is the formulation of a least-squares update rule that optimally combines information from multiple, potentially redundant contacts to refine the estimate and improve robustness against noise.

Simulation studies confirm the method’s effectiveness, demonstrating its ability to converge to an accurate pose estimate from an initially misaligned state. The framework is specifically designed for the described three-fingered end effector but is grounded in general principles of geometric reasoning and optimization. Therefore, it possesses a degree of generality and can be adapted to other claw-type or multi-fingered end effector designs that interact with geometrically constrained fixtures. Future work will focus on experimental validation on a physical robotic arm platform, investigating the effects of real-world disturbances like friction and structural flexibility, and further improving estimation accuracy for high-precision docking applications.