In the era of “Made in China 2025,” industrial robotics has seen rapid advancement, with the RV reducer standing as a critical core component due to its high stiffness, load capacity, and compact design. However, the practical operation of RV reducers often involves harsh environments where vibration signals are contaminated by random noise, posing significant challenges for fault diagnosis. Traditional methods, such as manual feature extraction combined with machine learning, struggle under noisy conditions. To address this, I propose an Anti-Noise Network (ANNet), a convolutional neural network model designed specifically for robust fault mode identification in RV reducers amidst noise interference. This model innovatively combines signal stacking, input Dropout, and multi-scale kernel modules to enhance feature learning and fusion. Through extensive experiments, I demonstrate that ANNet outperforms existing algorithms, particularly under strong noise, achieving 10%–20% higher accuracy. In this article, I will detail the methodology, experimental validation, and intrinsic reasons behind ANNet’s anti-noise capabilities.

The RV reducer, or rotate vector reducer, is pivotal in robotics for motion transmission, but its complex structure—including planetary gears, cycloidal gears, and pins—makes it prone to failures like wear, cracks, or composite faults. Vibration signal analysis is a common diagnostic tool, yet noise from motors, bearings, or external sources often corrupts these signals, masking fault characteristics. Deep learning approaches, especially convolutional neural networks (CNNs), have shown promise in automated feature extraction, but they typically assume clean data. In reality, the RV reducer operates under variable loads and speeds, leading to signal-to-noise ratio (SNR) fluctuations. My goal is to develop a model that mimics this noisy environment during training, thereby learning resilient features. ANNet achieves this by directly interfering with input signals via Dropout and employing multi-scale convolutions to capture diverse signal patterns. This approach not only improves diagnostic accuracy but also reduces reliance on expert knowledge for kernel selection.
The core of my methodology begins with signal stacking, where one-dimensional vibration signals from the RV reducer are transformed into two-dimensional grayscale images. This conversion preserves temporal and structural information, enabling CNNs to process spatial patterns. For a signal sequence of length \( m \times n \), I split it into \( n \) segments, each containing \( m \) data points, and stack them row-wise to form a \( m \times n \) image. In my implementation, I set \( m = 32 \) and \( n = 32 \), resulting in 32×32 images. This representation allows the model to exploit local correlations in the RV reducer’s vibration data, which often contain periodic fault signatures. The transformation can be expressed as:
$$ \text{Image}_{i,j} = x_{(i-1) \times n + j}, \quad \text{for } i=1,\dots,m, \ j=1,\dots,n, $$
where \( x \) is the original vibration signal. This step is crucial for preparing data for subsequent CNN layers, as it aligns with the standard input format for image-based networks.
Next, I introduce input Dropout, a key innovation in ANNet. Unlike conventional Dropout applied to hidden layers, this operation randomly zeros elements in the input signal matrix, simulating random noise interference. Given an input image \( X \in \mathbb{R}^{m \times n} \), the Dropout operation outputs \( X’ = R \odot X \), where \( \odot \) denotes element-wise multiplication, and \( R \) is a binary matrix with elements drawn from a Bernoulli distribution with probability \( p \). The probability \( p \) is uniformly sampled between 0.1 and 0.9 during training, increasing linearly with iterations to gradually expose the model to varying interference levels. Mathematically:
$$ r_{i,j} \sim \text{Bernoulli}(p), \quad p \sim \text{Uniform}(l, 0.9), \quad l = 0.1 + (0.9 – 0.1) \times \frac{s}{S}, $$
where \( s \) is the current iteration and \( S = 30,000 \) is the total iterations. This dynamic range ensures that the RV reducer model experiences diverse noise patterns, enhancing generalization. The Dropout rate \( 1-p \) represents the fraction of signal points masked, akin to salt-and-pepper noise in images. By corrupting input data, ANNet learns to rely on robust features rather than spurious correlations, which is vital for real-world RV reducer applications where sensor noise is inevitable.
The architecture of ANNet consists of multiple blocks, each featuring multi-scale convolutional kernels. A single block integrates three parallel convolutional layers with kernel sizes of 15×15, 7×7, and 3×3, all using zero-padding and stride 1 to maintain spatial dimensions. Each convolution is followed by batch normalization (BN) and ReLU activation. The outputs are then concatenated along the channel dimension, allowing feature fusion across scales. For an input feature map \( F_{\text{in}} \), the block computes:
$$ \text{Re1} = \text{ReLU}(\text{BN}(\text{Conv}_{15\times15}(F_{\text{in}}))), $$
$$ \text{Re2} = \text{ReLU}(\text{BN}(\text{Conv}_{7\times7}(F_{\text{in}}))), $$
$$ \text{Re3} = \text{ReLU}(\text{BN}(\text{Conv}_{3\times3}(F_{\text{in}}))), $$
$$ \text{Re} = \text{concat}(\text{Re1}, \text{Re2}, \text{Re3}), $$
where concat denotes channel-wise concatenation. This design enables the model to capture both global patterns (via large kernels) and local details (via small kernels) from the RV reducer signals, which is beneficial for detecting faults that manifest at different frequency ranges. The complete ANNet includes five such blocks, with channel depths increasing from 48 to 768, as summarized in Table 1.
| Layer | Output Tensor | Parameters |
|---|---|---|
| Input Signal | 32×32×1 | – |
| Dropout1 | 32×32×1 | p ~ U(0.1,0.9) |
| Block 1 | 32×32×48 | Kernels: 15×15, 7×7, 3×3 (16 each) |
| Block 2 | 32×32×96 | Kernels: 15×15, 7×7, 3×3 (32 each) |
| Block 3 | 32×32×192 | Kernels: 15×15, 7×7, 3×3 (64 each) |
| Block 4 | 32×32×384 | Kernels: 15×15, 7×7, 3×3 (128 each) |
| Block 5 | 32×32×768 | Kernels: 15×15, 7×7, 3×3 (256 each) |
| Global Average Pooling | 1×1×768 | Pool size: 32×32 |
| Dropout2 | 1×1×768 | Rate: 0.5 |
| Fully Connected | 5 | Softmax activation |
After the fifth block, a global average pooling layer reduces the feature map to 1×1×768, followed by another Dropout (rate 0.5) on the flattened features to prevent overfitting. Finally, a fully connected layer with softmax outputs probabilities for five fault classes. The model is trained using Adam optimizer with an initial learning rate of 0.001, decayed linearly, and a batch size of 16. No weight regularization is applied, as the Dropout operations suffice for regularization.
To validate ANNet, I conducted experiments on a dedicated RV reducer test rig, as shown earlier. The setup includes a motor running at 400 rpm, a magnetic powder brake applying 40 N·m load, and a vibration sensor sampling at 2 kHz along the axial direction. Five health states were considered: normal, planetary gear fault, cycloidal gear fault, composite fault of planetary gear and pin, and composite fault of cycloidal gear and pin. Composite faults are particularly challenging due to overlapping signatures. Vibration signals were segmented and converted to 32×32 images, yielding datasets described in Table 2.
| Dataset Type | Noise Level | Samples per Class | Total Samples |
|---|---|---|---|
| Training | None (clean) | 4,500 | 22,500 |
| Test Set 1 | None (clean) | 500 | 2,500 |
| Test Set 2 | 15 dB SNR | 500 | 2,500 |
| Test Set 3 | 12 dB SNR | 500 | 2,500 |
| Test Set 4 | 9 dB SNR | 500 | 2,500 |
| Test Set 5 | 6 dB SNR | 500 | 2,500 |
| Test Set 6 | 3 dB SNR | 500 | 2,500 |
Noise was added as Gaussian white noise to test sets, with SNR defined as \( \text{SNR} = 10 \log_{10}(P_{\text{signal}} / P_{\text{noise}}) \), where lower dB indicates stronger noise. During training, ANNet uses only clean data with input Dropout, while testing involves noisy data without Dropout on inputs. This mimics real-world scenarios where the RV reducer encounters unseen noise.
I compared ANNet against three established CNN-based methods: a standard CNN (similar to LeNet), ResNet (with residual blocks), and TICNN (Training Interference CNN, adapted with 2D kernels). All models used the same input size and were trained for 30,000 iterations. Each test was repeated 20 times, and average accuracy was reported. The results, plotted in Figure 1, show ANNet’s superiority across noise levels, especially under 3 dB noise where it surpasses others by 10–20%.
The accuracy trends can be summarized with a formula for noise robustness \( R \), defined as the accuracy drop relative to clean data: \( R = A_{\text{clean}} – A_{\text{noise}} \). For ANNet, \( R \) remains low, e.g., at 3 dB, \( R \approx 0.15 \), whereas for ResNet, \( R \approx 0.35 \). This highlights ANNet’s resilience. The performance difference stems from ANNet’s design: input Dropout acts as a noise simulator, and multi-scale kernels extract complementary features. In contrast, standard CNNs lack interference mechanisms, and ResNet may overfit to clean features, failing under noise. TICNN applies Dropout to first-layer kernels, which is less effective than direct input corruption.
To further analyze, I conducted ablation studies. First, removing input Dropout from ANNet led to a significant accuracy drop, particularly at 3 dB, where accuracy decreased by about 10%. This confirms that interfering with inputs is crucial for teaching the model to ignore noise. Second, replacing multi-scale kernels with single-scale ones (e.g., only 7×7) reduced accuracy, as shown in Table 3. The 7×7 kernel performed worst under strong noise, while 3×3 and 15×15 kernels offered better local and global feature retention, respectively. ANNet’s fusion of all scales yields optimal results.
| Kernel Configuration | Accuracy (%) | Standard Deviation |
|---|---|---|
| Single-scale: 3×3 only | 82.3 | ±1.5 |
| Single-scale: 7×7 only | 78.6 | ±1.8 |
| Single-scale: 15×15 only | 83.1 | ±1.4 |
| Multi-scale (ANNet default) | 89.7 | ±1.2 |
The effectiveness of input Dropout can be understood through information theory. By randomly masking signal points, the model is forced to rely on redundant information, increasing feature robustness. The Dropout probability \( p \) controls the masking rate, and its uniform sampling ensures exposure to various corruption levels. For an RV reducer signal with energy \( E \), the effective energy after Dropout is \( E’ = pE \), but the model learns to reconstruct features from partial data, akin to denoising autoencoders. However, ANNet skips explicit reconstruction, directly classifying corrupted inputs, which speeds up training.
Multi-scale kernels address the multi-component nature of RV reducer vibrations. Faults in planetary gears, cycloidal gears, or pins produce frequency components spanning different scales. A large kernel (15×15) captures low-frequency trends, while small kernels (3×3) detect high-frequency anomalies. The concatenation operation fuses these features, which can be expressed as a combined feature map \( F_{\text{combined}} = [F_{15}, F_{7}, F_{3}] \), where each \( F \) represents features from a kernel size. This fusion enhances the model’s ability to diagnose composite faults, where multiple fault types coexist in the RV reducer.
In practice, the RV reducer operates under varying loads and speeds, which affect vibration patterns. While this study focuses on fixed conditions, ANNet’s anti-noise design suggests potential for generalization. Future work could incorporate domain adaptation techniques to handle such variations. Additionally, the model’s computational cost is moderate, with approximately 2.5 million parameters, making it feasible for embedded deployment in robotic systems.
In conclusion, I have presented ANNet, a novel CNN model for fault diagnosis of RV reducers under noise interference. By integrating input Dropout and multi-scale convolutional kernels, ANNet achieves state-of-the-art accuracy, especially in strong noise environments. The model’s success stems from its ability to simulate noise during training and extract robust multi-scale features. This approach reduces reliance on manual feature engineering and enhances the reliability of RV reducer health monitoring. As industrial robotics continues to evolve, such intelligent diagnostic tools will be essential for predictive maintenance and operational safety.
The mathematical formulations and experimental results underscore ANNet’s superiority. For instance, the overall accuracy \( A \) as a function of noise level \( \text{SNR} \) can be modeled as \( A(\text{SNR}) = A_0 – \alpha e^{-\beta \cdot \text{SNR}} \), where \( A_0 \) is clean accuracy, and \( \alpha, \beta \) are constants. For ANNet, \( \alpha \) is smaller, indicating slower degradation with noise. This robustness is critical for real-world applications where the RV reducer is subject to unpredictable disturbances. I believe that ANNet sets a new benchmark for noise-resistant fault diagnosis and can be extended to other rotating machinery beyond the RV reducer.
Further insights come from visualizing feature maps. In ANNet, early layers learn edge-like patterns from vibration images, while deeper layers combine them into fault-specific motifs. The Dropout operation encourages distributed representations, preventing over-reliance on any single signal point. This is analogous to ensemble learning, where multiple “sub-models” are trained on corrupted versions of data. The multi-scale block acts as a parallel ensemble, with each kernel size contributing a unique perspective on the RV reducer’s health state.
From an engineering perspective, the RV reducer’s durability is paramount, and early fault detection can prevent costly downtime. ANNet’s high accuracy under noise means it can be deployed in noisy industrial settings without requiring extensive signal preprocessing. The signal stacking method also simplifies data preparation, as raw vibration data can be directly fed into the model after segmentation. This end-to-end approach streamlines the diagnostic pipeline for RV reducers.
In summary, the key contributions of this work are: (1) proposing input Dropout for direct noise simulation in RV reducer fault diagnosis, (2) designing a multi-scale kernel module for comprehensive feature extraction, and (3) demonstrating superior performance via rigorous experiments. The ANNet model represents a significant step toward robust, deep learning-based condition monitoring for critical components like the RV reducer. As noise remains a pervasive challenge in industrial environments, such innovations will drive the advancement of intelligent maintenance systems.
