Autonomous Grasping for Under-Actuated Dexterous Robotic Hands via Grasp Type Detection

In the field of robotics, achieving stable and reliable grasping with dexterous robotic hands remains a significant challenge, especially when dealing with diverse objects in unstructured environments. My research focuses on under-actuated dexterous robotic hands, which offer a balance between simplicity and adaptability, making them suitable for autonomous grasping tasks. This work presents a method inspired by human grasping strategies, utilizing deep learning for grasp type detection to simplify planning and enhance the performance of under-actuated dexterous robotic hands. The core idea is to leverage the complementary nature of data-driven approaches and the inherent adaptability of under-actuated designs, enabling robust grasping without complex control schemes.

Dexterous robotic hands, particularly under-actuated variants, have garnered attention due to their ability to conform to object shapes with fewer actuators than degrees of freedom. This reduces control complexity while maintaining grasping versatility. However, planning precise grasps for such hands often requires extensive sensing and computation. To address this, I propose a vision-based system that predicts grasp types and poses from RGB images, allowing the dexterous robotic hand to execute adaptive grasps with minimal prior information. The method integrates deep learning for high-level recognition and traditional image processing for low-level pose estimation, all tailored to the capabilities of under-actuated dexterous robotic hands.

My approach begins with the classification of grasp types based on human hand postures. I define four primary grasp types relevant to planar grasping scenarios: cylindrical enveloping, spherical enveloping, fine pinch, and wide pinch. These categories are derived from the dichotomy between power grasps (involving full hand contact) and precision grasps (involving fingertip contact), adapted to the constraints of under-actuated dexterous robotic hands. Each type corresponds to specific hand configurations and control parameters, as summarized in Table 1.

Table 1: Grasp Type Classification for Dexterous Robotic Hands
Grasp Type	Category	Object Thickness (mm)	Object Width (mm)	Hand Configuration
Cylindrical Enveloping	Power Grasp	>30	Varies	Small thumb abduction, full finger wrapping
Spherical Enveloping	Power Grasp	>30	Varies	Large thumb abduction, full finger wrapping
Fine Pinch	Precision Grasp	<30	<30	Small thumb-index separation, fingertip contact
Wide Pinch	Precision Grasp	<30	>30	Large thumb-finger separation, fingertip contact

To train a deep learning model for grasp type detection, I constructed a dataset comprising 80 common daily objects, each annotated with grasp type and a bounding box representing the grasping region. The dataset includes 1,344 RGB images captured with a Kinect v2 depth camera, with objects placed in various positions and orientations on a desktop. The distribution of objects across grasp types is shown in Table 2. This dataset enables the model to learn mappings from object appearances to suitable grasp types for dexterous robotic hands.

Table 2: Dataset Statistics for Grasp Type Detection
Grasp Type	Number of Objects	Number of Images	Annotation Label
Cylindrical Enveloping	17	272	power1
Spherical Enveloping	22	352	power2
Fine Pinch	14	224	precision1
Wide Pinch	27	432	precision2

The deep learning model employs the YOLOv3 architecture for object detection, which simultaneously predicts grasp type and grasping region from input images. YOLOv3 is chosen for its speed and accuracy in real-time applications, making it suitable for robotic systems. The model is trained on 1,103 images and tested on 241 images, achieving an overall accuracy of 98.70% on known objects. The performance breakdown per grasp type is detailed in Table 3. Importantly, the model generalizes well to unseen objects, with an accuracy of 82.70% on a separate set of 24 unknown objects, demonstrating its utility for dexterous robotic hands in novel scenarios.

Table 3: Deep Learning Model Performance on Grasp Type Detection
Grasp Type	Accuracy on Test Set (%)	Precision	Recall
Cylindrical Enveloping (power1)	99.50	0.99	0.98
Spherical Enveloping (power2)	99.50	0.98	0.99
Fine Pinch (precision1)	96.60	0.97	0.96
Wide Pinch (precision2)	99.30	0.99	0.99

Once the grasp type and region are identified, image processing techniques are used to estimate the grasp angle. The Canny edge detector is applied to the region of interest, followed by morphological operations like dilation and erosion to fill gaps and smooth contours. A minimum bounding rectangle is then fitted to the object’s edges, and the orientation of its long side is extracted as the grasp angle. This process is robust to object shapes and colors, providing a reliable pose estimate for the dexterous robotic hand. The steps can be mathematically described as follows: let $I(x,y)$ be the image region, then edge detection yields a binary map $E(x,y)$ via thresholding gradients. Morphological closing with a kernel $K$ enhances the shape: $M(x,y) = (E \oplus K) \ominus K$, where $\oplus$ and $\ominus$ denote dilation and erosion, respectively. The grasp angle $\theta$ is computed from the principal axis of the bounding rectangle around $M(x,y)$.

Control and planning for the under-actuated dexterous robotic hand are designed to leverage its adaptive nature. The hand has five fingers driven by six motors via tendon-sheath mechanisms, allowing under-actuated motion where finger joints conform to object shapes passively. The control strategy involves three modes: position control for pre-grasp configuration, velocity control for adaptive closing, and current control for force-based stopping. The pre-grasp parameters vary by grasp type; for example, cylindrical enveloping uses a small thumb abduction angle, while spherical enveloping uses a larger one. The adaptability of the dexterous robotic hand compensates for inaccuracies in the vision system, as fingers automatically adjust contact points based on object geometry.

To execute grasps, the robotic arm (UR3e) positions the dexterous robotic hand above the target object. The grasp point and angle from vision are transformed to the robot’s coordinate frame. The transformation involves two steps: from image coordinates to camera coordinates, and then to the robot base frame. Given the image coordinates $(u, v)$ and depth $z_c$ from the depth camera, the camera coordinates $(x_c, y_c, z_c)$ are computed using the camera intrinsic matrix $K$:

$$ \begin{bmatrix} x_c \\ y_c \\ z_c \end{bmatrix} = z_c K^{-1} \begin{bmatrix} u \\ v \\ 1 \end{bmatrix}, \quad \text{where } K = \begin{bmatrix} f_x & 0 & u_0 \\ 0 & f_y & v_0 \\ 0 & 0 & 1 \end{bmatrix}. $$

Then, using the hand-eye calibration matrix, the robot base coordinates $(x, y, z)$ are obtained:

$$ \begin{bmatrix} x \\ y \\ z \\ 1 \end{bmatrix} = \begin{bmatrix} R & T \\ 0 & 1 \end{bmatrix} \begin{bmatrix} x_c \\ y_c \\ z_c \\ 1 \end{bmatrix}, $$

where $R$ is the rotation matrix and $T$ is the translation vector. The grasp angle $\theta$ directly sets the robot’s end-effector orientation around the z-axis. This seamless integration allows the dexterous robotic hand to approach and grasp objects autonomously.

Experiments were conducted with a setup consisting of a UR3e robotic arm equipped with the under-actuated dexterous robotic hand and a Kinect v2 camera overhead. A total of 24 objects (12 known and 12 unknown) were grasped in 120 trials, with each object attempted five times from different poses. The results, shown in Table 4, indicate an average success rate of 90.80%, with known objects achieving 93.30% and unknown objects 88.30%. Failures were primarily due to objects being too heavy, too slippery, or having unstable centroids, but the dexterous robotic hand demonstrated robustness across diverse shapes and sizes.

Table 4: Grasping Experiment Results with Dexterous Robotic Hand
Grasp Type	Number of Trials	Successful Grasps	Success Rate (%)
Cylindrical Enveloping	30	29	96.70
Spherical Enveloping	30	27	90.00
Fine Pinch	30	26	86.70
Wide Pinch	30	27	90.00
Overall (Known Objects)	60	56	93.30
Overall (Unknown Objects)	60	53	88.30
Total	120	109	90.80

The success of this method hinges on the synergy between deep learning and the mechanical design of the dexterous robotic hand. The vision system provides high-level guidance, while the under-actuated mechanism handles low-level uncertainties. This is formalized by considering the grasp stability metric $S$, which depends on the contact forces $F_i$ and object geometry. For an under-actuated dexterous robotic hand, the tendon forces $f_t$ generate joint torques $\tau_j$ through a transmission matrix $A$: $\tau = A f_t$. The equilibrium condition for grasping is given by:

$$ \sum_{i=1}^{n} J_i^T F_i = \tau, $$

where $J_i$ is the Jacobian matrix at contact point $i$, and $n$ is the number of contacts. The adaptability ensures that $F_i$ automatically adjust to satisfy this equation, even if the predicted grasp pose has errors. Thus, the dexterous robotic hand can achieve stable grasps without precise force control.

In comparison to prior works, this method offers a balanced approach for dexterous robotic hands. Many existing systems rely on complex planning or full actuation, but my approach simplifies the pipeline by using grasp type detection. For instance, some methods require 3D object models or extensive tactile feedback, whereas here, a single RGB image suffices for the dexterous robotic hand to perform effectively. The use of under-actuation also reduces cost and control overhead, making it practical for real-world applications.

Future work will extend this method to multi-object scenarios and dynamic environments. The current system assumes isolated objects on a desktop, but cluttered scenes pose additional challenges for dexterous robotic hands. I plan to integrate semantic segmentation and depth sensing to improve grasp selection in crowded settings. Moreover, enhancing the deep learning model with reinforcement learning could allow the dexterous robotic hand to learn from trial and error, further boosting adaptability. Another direction is to incorporate tactile sensors on the dexterous robotic hand fingers, providing feedback for slip detection and fine manipulation.

In conclusion, I have presented an autonomous grasping method for under-actuated dexterous robotic hands based on grasp type detection. By combining deep learning for recognition and image processing for pose estimation, the system enables robust grasping of various objects with minimal planning. The dexterous robotic hand leverages its under-actuated design to compensate for vision inaccuracies, resulting in high success rates. This work demonstrates the potential of data-driven approaches to enhance the capabilities of dexterous robotic hands, paving the way for more intelligent and versatile robotic systems. The integration of vision, control, and adaptive mechanics underscores the importance of holistic design in advancing dexterous robotic hand technology.

The mathematical foundations of this approach can be further elaborated through optimization frameworks. For example, the grasp quality $Q$ can be defined as a function of the contact points $p_i$ and the object’s mass distribution $m$. Using the grasp matrix $G$, which maps contact forces to resultant wrenches, we have:

$$ Q = \det(G G^T), $$

which measures the volume of the force ellipsoid. The dexterous robotic hand aims to maximize $Q$ within the constraints of its kinematics. In practice, the deep learning model approximates this by predicting grasp types that historically yield high $Q$ values. This data-driven optimization reduces computational burden, aligning with the efficiency goals for dexterous robotic hands.

Additionally, the control laws for the dexterous robotic hand can be expressed in state-space form. Let $q$ be the joint angles, $\dot{q}$ the joint velocities, and $u$ the motor inputs. The dynamics are:

$$ M(q) \ddot{q} + C(q, \dot{q}) \dot{q} + g(q) = \tau(u) + J^T F, $$

where $M$ is the inertia matrix, $C$ accounts for Coriolis forces, $g$ is gravity, and $F$ are contact forces. For under-actuated systems, $\tau(u)$ has fewer independent components than $q$, but the passive dynamics help stabilize grasps. The current control strategy uses a threshold $I_{\text{th}}$ to stop motors when $I(t) > I_{\text{th}}$, where $I(t)$ is the motor current proportional to torque. This ensures the dexterous robotic hand applies sufficient force without damaging objects.

To summarize, the key innovations of this work include: a dedicated dataset for grasp type detection, a real-time deep learning pipeline, and a control scheme that harnesses under-actuation. The dexterous robotic hand serves as a testbed for these ideas, showing that simplicity in design does not preclude sophistication in function. As robotics continues to evolve, dexterous robotic hands will play a crucial role in human-robot interaction and automation, and methods like this will be essential for their widespread adoption.