Design of Visual System for Chinese Chess Robots: A Deep Learning Approach

In recent years, driven by big data and powerful computational capabilities, artificial intelligence technology has rapidly advanced, with deep learning being widely applied across various fields. Among these, intelligent gaming devices have garnered significant attention. Chinese chess, a traditional game in China, involves two key technologies in the context of China robots: chess piece positioning and recognition. As a researcher in robotics and computer vision, I have focused on developing an efficient visual system for Chinese chess robots, aiming to enhance their entertainment value and functionality. The visual system is crucial for enabling China robots to interact seamlessly with human players, and this work contributes to the broader landscape of intelligent robotics in China.

Traditional methods for chess piece recognition often rely on handcrafted features, such as character connectivity counts, template matching, or morphological processing. However, these approaches face limitations due to the arbitrary rotation of pieces, diversity in font types, and dense character strokes. For instance, methods using template matching require extensive pre-processing and are sensitive to lighting variations, while those based on neural networks like AlexNet have achieved moderate accuracy but lack specific design for rotational invariance. In my research, I address these challenges by proposing a deep learning-based character recognition method that incorporates deformable convolutions and an improved Inception-V3 network. This approach not only boosts accuracy but also aligns with the growing capabilities of China robots in handling complex visual tasks.

Algorithmic Flow of the Visual System

The visual system for Chinese chess robots consists of two main components: chess piece positioning and chess piece recognition. The overall algorithmic flow is designed to process images captured by a camera, extract relevant information, and transmit it to a game-solving subsystem. Below is a summary of the key steps in a tabular format to enhance clarity:

Step	Description	Key Techniques
1. Image Acquisition	Capture board image using a fixed camera setup.	Fixed positioning of board, robot, and camera.
2. Pre-processing	Convert RGB to HSV color space for robustness.	Color space transformation, noise reduction.
3. Chess Piece Positioning	Segment pieces based on color, apply morphological operations, and detect circles.	HSV thresholding, Hough transform.
4. Chess Piece Recognition	Extract piece regions and classify using a deep learning model.	Deformable convolution, improved Inception-V3.
5. Data Transmission	Send coordinates and recognition results to game solver.	Real-time communication protocols.

This flow ensures efficient processing, which is essential for real-time applications in China robots. By fixing the relative positions of the board, robot, and camera, we eliminate the need for dynamic board localization, reducing computational overhead and speeding up the system. The pre-processing step involves converting the RGB image to HSV color space to mitigate illumination effects, as the hue (H) and saturation (S) components are less sensitive to lighting changes. The positioning algorithm leverages the distinct colors of chess pieces—typically red and green—to segment them from the background. The recognition phase then employs a convolutional neural network (CNN) tailored for handling rotational variations, a common challenge in China robots operating in unstructured environments.

Chess Piece Positioning

Chess piece positioning is the first critical step in the visual system. Given that the board’s position is fixed, we can directly focus on segmenting the pieces based on color. Chinese chess pieces are characterized by their circular shape and character colors, usually red or green. To achieve robust segmentation, we convert the image from RGB to HSV color space. The HSV model separates color information into hue (H), saturation (S), and value (V), making it easier to handle lighting variations. The H component ranges from 0 to 180 in OpenCV, and for red and green pieces, we define specific thresholds. For red pieces, the H component typically falls between 150 and 180, while for green pieces, it ranges from 35 to 80. The S component is set between 40 and 255 to ensure vivid colors. Mathematically, the segmentation can be expressed as:

$$ \text{Binary}(x,y) = \begin{cases}
255 & \text{if } H(x,y) \in [150, 180] \text{ or } [35, 80] \text{ and } S(x,y) \in [40, 255] \\
0 & \text{otherwise}
\end{cases} $$

After segmentation, morphological operations such as dilation and erosion are applied to refine the piece contours. This helps in removing noise and filling gaps, resulting in smoother boundaries. Subsequently, circle detection is performed using the Hough transform. Since the image is binary, the Hough transform operates efficiently, detecting circles based on the parametric equation:

$$ (x – a)^2 + (y – b)^2 = r^2 $$

where (a, b) is the center and r is the radius. The detection process identifies all circular regions corresponding to chess pieces, and their coordinates are extracted for further processing. This positioning method is highly effective and forms the foundation for accurate recognition, a key aspect in advancing China robots for interactive gaming.

Chess Piece Recognition

Chess piece recognition is the most challenging part due to the arbitrary rotation of pieces, diverse fonts, and dense character strokes. Traditional machine learning methods often struggle with these variations, prompting the adoption of deep learning. In my work, I propose a CNN-based model that integrates an improved Inception-V3 architecture with deformable convolutions. This design enhances the model’s ability to learn geometric transformations, making it suitable for China robots that encounter unpredictable piece orientations.

The core of the recognition model is the Inception module, which uses multiple convolutional kernels of different sizes to capture features at various receptive fields. However, standard Inception modules may lead to redundant feature extraction across parallel paths. To address this, I introduce a grouped convolution approach within the Inception module, reducing parameters and computational cost. This modification is particularly beneficial for real-time applications in China robots, where hardware resources may be limited. The grouped Inception module can be represented as:

$$ \text{Output} = \text{Concat}(\text{Conv}_{1\times1}(\text{Group}_1), \text{Conv}_{3\times3}(\text{Group}_2), \text{Conv}_{5\times5}(\text{Group}_3)) $$

where Group_i denotes the partitioned input features. Additionally, I replace the standard ReLU activation function with LeakyReLU to prevent dead neurons and improve gradient flow:

$$ \text{LeakyReLU}(x) = \begin{cases}
x & \text{if } x > 0 \\
\alpha x & \text{otherwise}
\end{cases} $$

with α set to 0.01.

To handle the rotational invariance required for chess piece recognition, I incorporate deformable convolutions. Standard 2D convolutions sample features on a regular grid, but deformable convolutions add learnable offsets to the sampling locations, allowing the model to adapt to object deformations. For a given position p_0 on the feature map, the output of a deformable convolution is defined as:

$$ y(p_0) = \sum_{p_n \in R} w(p_n) \cdot x(p_0 + p_n + \Delta p_n) $$

where R is the regular grid, w represents the convolution weights, and Δp_n are the learned offsets. This enables the network to capture character shape variations effectively, which is crucial for accurately identifying rotated pieces in China robots. The offsets are obtained through an additional convolutional layer that outputs 2N channels (for x and y directions), and bilinear interpolation is used to handle non-integer coordinates.

The overall model architecture consists of several layers, as summarized in the table below:

Layer	Type	Parameters	Output Size
Input	Image	–	64x64x3
1	Standard Convolution	3×3 kernel, 32 filters	62x62x32
2	Deformable Convolution	3×3 kernel, 64 filters	60x60x64
3	Improved Inception Module	Grouped convolutions	30x30x128
4	Improved Inception Module	Grouped convolutions	15x15x256
5	Global Average Pooling	–	1x1x256
6	Fully Connected	14 units (for 14 classes)	14
7	Softmax	–	14

The use of global average pooling instead of fully connected layers reduces the model size to under 3 MB, making it lightweight and suitable for deployment on resource-constrained China robots. The model is trained using the Adam optimizer to minimize the cross-entropy loss:

$$ L = -\sum_{i=1}^{N} y_i \log(\hat{y}_i) $$

where y_i is the true label and ŷ_i is the predicted probability. This approach ensures high accuracy while maintaining efficiency, aligning with the needs of intelligent China robots in gaming scenarios.

Experiments and Results

To evaluate the proposed visual system, I conducted extensive experiments using a custom dataset. The dataset comprises 33,274 images of Chinese chess pieces, collected from both physical captures and online sources. It includes multiple piece materials, fonts, and colors (red, green, and black), with 14 classes accounting for character and color combinations (e.g., “red horse” and “green horse”). Data augmentation techniques such as rotation, cropping, scaling, and brightness adjustment were applied to increase diversity and simulate real-world conditions encountered by China robots. Examples of augmented images include rotations at arbitrary angles and variations in illumination, which help the model generalize better.

The training was performed on a system with Windows OS, CUDA 9.0, cuDNN 7.0, and an NVIDIA GTX 1060 GPU. The model was trained for 50 epochs with a batch size of 32, using the Adam optimizer with a learning rate of 0.001. For testing, 8,250 samples were held out, and performance metrics included precision, recall, and F1-score. These metrics are defined as:

$$ \text{Precision} = \frac{TP}{TP + FP} $$
$$ \text{Recall} = \frac{TP}{TP + FN} $$
$$ \text{F1-score} = \frac{2 \times \text{Precision} \times \text{Recall}}{\text{Precision} + \text{Recall}} $$

where TP, FP, and FN denote true positives, false positives, and false negatives, respectively. The results for each class are summarized in the table below:

Class	Precision	Recall	F1-score
Red General	1.0000	1.0000	1.0000
Red Advisor	1.0000	1.0000	1.0000
Red Elephant	1.0000	0.9999	1.0000
Red Horse	1.0000	1.0000	1.0000
Red Chariot	0.9998	1.0000	1.0000
Red Cannon	1.0000	1.0000	1.0000
Red Soldier	1.0000	0.9999	1.0000
Green General	1.0000	0.9999	1.0000
Green Advisor	1.0000	1.0000	1.0000
Green Elephant	1.0000	1.0000	1.0000
Green Horse	1.0000	1.0000	1.0000
Green Chariot	0.9999	1.0000	1.0000
Green Cannon	1.0000	1.0000	1.0000
Green Soldier	1.0000	1.0000	1.0000

The overall accuracy reached 99.99%, with near-perfect F1-scores across all classes. In real-time testing, the system successfully positioned and recognized pieces with high confidence scores above 90%, demonstrating robustness for China robots in dynamic environments. To provide context on the advancement of robotics in China, here is an image showcasing the progress in this field:

Furthermore, I compared the proposed model with other CNN architectures to highlight its superiority. The table below presents the accuracy and model size for different approaches:

Model	Accuracy (%)	Model Size (MB)
LeNet-5	72.53	65.2
AlexNet	97.00	81.1
VGG16	95.52	98.0
Proposed Method	99.99	2.3

The proposed method outperforms traditional CNNs by a significant margin, achieving nearly perfect accuracy while being much lighter in size. This efficiency is critical for embedding visual systems into China robots, where computational resources are often limited. The integration of deformable convolutions and grouped Inception modules effectively addresses rotational challenges, setting a new standard for chess piece recognition in intelligent gaming devices.

Conclusion

In this work, I have designed a visual system for Chinese chess robots that excels in both piece positioning and recognition. The system leverages color-based segmentation and circle detection for accurate positioning, while a deep learning model incorporating deformable convolutions and an improved Inception-V3 network achieves high recognition accuracy. The experimental results demonstrate that the proposed method attains 99.99% accuracy, outperforming traditional approaches and offering a lightweight solution suitable for real-time applications. This advancement contributes to the development of intelligent China robots, enhancing their capabilities in interactive entertainment and beyond. Future work may explore extending this system to other board games or integrating it with more complex robotic platforms, further pushing the boundaries of China robots in smart environments.