Intelligent Robot Abnormality Detection in Libraries: A Robust Data Mining Framework

Modern library services are increasingly augmented by the deployment of intelligent robot assistants. These intelligent robot systems, integrating advanced artificial intelligence, the Internet of Things (IoT), and sensor technologies, are designed for tasks such as automated book retrieval, inventory management, shelf organization, and user guidance. They provide a smart perception of and response to the library environment. However, during their operational lifecycle, an intelligent robot may encounter various abnormal states, ranging from mechanical wear and sensor degradation to software glitches and environmental interference. These abnormalities, if undetected, can lead to service interruption, damage to library materials, or a degraded user experience. The operational environment of a library is inherently complex and variable, featuring fluctuating human traffic, changing light conditions, and potential electromagnetic interference from various electronic devices. This complexity often leads to incomplete or noisy sensor data, presenting a significant challenge for reliable fault diagnosis. Therefore, developing an automated detection system capable of accurately identifying the abnormal states of a library intelligent robot under such non-ideal conditions is of paramount importance for ensuring service continuity and operational safety.

Traditional threshold-based or rule-based monitoring systems often lack the adaptability to handle the multivariate and correlated nature of sensor data from an intelligent robot. They struggle with false alarms in noisy environments and may fail to detect incipient faults. Data mining techniques, which focus on extracting hidden patterns and knowledge from large datasets, offer a powerful alternative. Among these, the Naive Bayes algorithm is particularly noted for its simplicity, computational efficiency, and inherent robustness to irrelevant features and noisy data. Its probabilistic foundation allows it to make inferences even when some data attributes are missing—a common scenario in real-world deployments. This property makes it highly suitable for constructing a stable and reliable anomaly detection system for a library intelligent robot operating under complex conditions. This article details the design and implementation of an automated detection system for an intelligent robot‘s abnormal states, leveraging a data mining approach centered on the Naive Bayes classifier to achieve high accuracy and resilience against data incompleteness.

System Architecture for Intelligent Robot Anomaly Detection

The proposed system is designed as a distributed, embedded solution that operates in real-time on the intelligent robot platform. Its primary function is to continuously monitor the robot’s health, diagnose anomalies, and initiate fail-safe procedures. The overall architecture integrates hardware sensing, data processing, intelligent analysis, and human-machine interaction modules.

Hardware Composition and Data Acquisition

At the core of the monitoring layer is a custom-designed Operational Status Collector. This unit is responsible for gathering multimodal sensor data that comprehensively reflects the state of the intelligent robot. A high-performance STM32 series microcontroller acts as the local hub, managing data collection from the following suite of sensors:

Primary Sensors in the **Intelligent Robot** Status Collector
Sensor Type	Model/Series	Measured Parameter	Purpose in Anomaly Detection
Vibration Sensor	356A16	Acceleration (g)	Detect mechanical imbalance, bearing wear, or unexpected collisions.
Temperature Sensor	DS18B20	Temperature (°C)	Monitor motor overheating, electronic component failure, or environmental extremes.
Optical Encoder	ENC-534	Position & Speed (pulses)	Identify locomotion errors, wheel slippage, or blocked movement.
Torque Sensor	FS Series	Torque (N·m)	Assess load anomalies, jamming in manipulators, or excessive force.
Inertial Measurement Unit (IMU)	Integrated (Gyro+Accel)	Pose, Angular Rate	Detect tipping, unstable navigation, or abnormal orientation.

The collector’s workflow is systematic. Upon initialization, it reads its unique ID. When polled by the central controller, it sequentially samples all sensors, packages the data (including the ID and timestamp), and transmits it via a communication interface (e.g., CAN or UART) to the main system’s Embedded Controller. A dedicated Power Management Unit monitors lithium battery levels, which is itself a critical parameter for detecting low-power abnormal states. A voice alarm unit provides immediate audible alerts for critical faults.

Central Processing and Control Logic

The Embedded Controller serves as the brain of the detection system. Its key components and their functions are outlined below:

Information Processing Module: Receives raw sensor data and performs essential preprocessing. This includes noise filtering (e.g., using a moving average or band-pass filter for vibration signals), normalization, and handling of missing values through interpolation or flagging.
Core Detection Algorithm (Naive Bayes): Implements the data mining model that classifies the preprocessed sensor feature vector into one of several predefined abnormal or normal states.
Decision & Control Interface: Upon detection of a critical anomaly, this module can generate a command to safely halt the intelligent robot. This command is sent via an I/O interface to the robot’s primary motion controller (e.g., a Yaskawa or Panasonic servo system), instructing it to stop the motors and prevent further damage.
Auxiliary Modules: These include a File Operation module for logging data, a Wireless Communication module (e.g., Huawei EC1308) for remote monitoring, a Debug Terminal for system maintenance, and an Information Display module to present statuses to librarians or technicians.

Data Mining Foundation: The Naive Bayes Classifier

The selection of the Naive Bayes algorithm is pivotal for the system’s robustness. Its “naive” assumption of conditional independence between features given the class label simplifies computation and, counterintuitively, often yields excellent performance in practice, especially when data is scarce or noisy. This characteristic is ideal for an intelligent robot operating in a dynamic library where sensor readings may be intermittently unreliable.

Let the processed feature vector from the intelligent robot sensors at time t be represented as:
$$ \mathbf{X} = \{ x_1, x_2, x_3, …, x_n \} $$
where each $ x_i $ corresponds to a specific feature (e.g., vibration RMS value, average temperature, encoder deviation).

The system is trained to recognize a set of m possible states:
$$ \mathbf{Y} = \{ y_1, y_2, …, y_m \} $$
This set includes the “normal” state and various abnormal states such as “overheat,” “vibration_anomaly,” “low_battery,” etc.

The goal of the detection system is to find the state $ y_j $ that has the highest posterior probability given the observed features $\mathbf{X}$. Using Bayes’ theorem:
$$ P(y_j | \mathbf{X}) = \frac{P(\mathbf{X} | y_j) \cdot P(y_j)}{P(\mathbf{X})} $$
For classification, the denominator $ P(\mathbf{X}) $ is constant. Therefore, we compute:
$$ \hat{y} = \arg \max_{y_j \in Y} \left[ P(y_j) \cdot P(\mathbf{X} | y_j) \right] $$
Where:

$ P(y_j) $ is the prior probability of state $ y_j $, estimated from the training data as:
$$ P(y_j) = \frac{N_{y_j}}{N_{total}} $$
with $ N_{y_j} $ being the count of training samples for state $ y_j $.
$ P(\mathbf{X} | y_j) $ is the likelihood of observing the feature set $\mathbf{X}$ given state $ y_j $. Applying the naive conditional independence assumption:
$$ P(\mathbf{X} | y_j) = \prod_{i=1}^{n} P(x_i | y_j) $$

The estimation of $ P(x_i | y_j) $ depends on the nature of the feature $ x_i $:

For discrete/categorical features (e.g., a binned vibration level):
$$ P(x_i | y_j) = \frac{N_{x_i, y_j}}{N_{y_j}} $$
where $ N_{x_i, y_j} $ is the number of times feature $ x_i $ appears in samples of class $ y_j $.
For continuous features (most sensor readings), we typically assume a Gaussian (Normal) distribution:
$$ P(x_i | y_j) = \frac{1}{\sqrt{2\pi\sigma_{y_j}^2}} \exp\left(-\frac{(x_i – \mu_{y_j})^2}{2\sigma_{y_j}^2}\right) $$
Here, $ \mu_{y_j} $ and $ \sigma_{y_j} $ are the mean and standard deviation of feature $ x_i $ for all training samples belonging to state $ y_j $.

Thus, the final intelligent robot abnormality detector is implemented as:
$$ \hat{y}(\mathbf{X}) = \arg \max_{y_j \in Y} \left[ P(y_j) \cdot \prod_{i=1}^{n} P(x_i | y_j) \right] $$
In practice, to avoid underflow from multiplying many small probabilities, the calculation is performed in the log domain:
$$ \hat{y}(\mathbf{X}) = \arg \max_{y_j \in Y} \left[ \log P(y_j) + \sum_{i=1}^{n} \log P(x_i | y_j) \right] $$

System Implementation and Experimental Validation

To validate the proposed system, a prototype was developed and tested with a library intelligent robot platform. The robot’s specifications are summarized below:

Specifications of the Test Library **Intelligent Robot**
Parameter	Value	Parameter	Value
Power Voltage	48 V DC	Navigation Speed	0.5 m/s
Maximum Payload	200 kg	Battery Capacity	10 Ah
Positioning Accuracy	≤ 5 cm	Noise Level	≤ 50 dB
Continuous Operation	8 hours	Max Climbing Angle	15°

The system was trained on a dataset comprising both normal operational data and data recorded during induced faults corresponding to common abnormal states. The target abnormal states for detection are defined as follows:

Defined Abnormal States for the Library **Intelligent Robot**
State ID	Abnormal State	Key Indicative Sensor(s)
1	Motor Overheat	Temperature (DS18B20 near motors)
2	Locomotion Error	Encoder (ENC-534), IMU
3	Low Battery	Power Management Unit
4	Excessive Vibration	Vibration Sensor (356A16)
5	Communication Loss	Network/Wireless Module
6	Mechanical Jam/Fault	Torque Sensor (FS), Encoder
7	Navigation Drift	IMU, Encoder, Position Data
8	System Software Error	Internal Logs, Heartbeat Signals

The data preprocessing stage is crucial. Raw sensor data, such as the vibration signal shown in its raw form, contains significant noise. Through the application of digital filters (e.g., a Butterworth low-pass filter), the signal is cleaned, revealing the underlying trends essential for accurate feature extraction (e.g., RMS, kurtosis, peak frequency). These extracted features form the vector $\mathbf{X}$ for the Naive Bayes classifier.

Performance Metrics and Robustness Testing

The primary metric for evaluating the binary and multi-class classification performance of the intelligent robot anomaly detector is the Matthews Correlation Coefficient (MCC). MCC is considered a balanced measure even when class sizes are disparate and is defined as:
$$ MCC = \frac{TP \times TN – FP \times FN}{\sqrt{(TP+FP)(TP+FN)(TN+FP)(TN+FN)}} $$
where TP, TN, FP, FN are True Positives, True Negatives, False Positives, and False Negatives, respectively. MCC values range from -1 (total disagreement) to +1 (perfect prediction).

The system’s robustness was tested under three challenging environmental conditions simulating real library scenarios, with varying levels of artificially introduced random data缺失 (missing data):

Lighting Fluctuation Environment: Simulates sudden changes affecting optical sensors.
Acoustic Noise Interference Environment: Simulates high background noise potentially affecting vibration and acoustic emission analysis.
Strong Electromagnetic Interference (EMI) Environment: Simulates interference from other devices that can corrupt sensor readings and communications.

The experimental results, measuring the MCC of the detection system under increasing proportions of missing data in the input feature vector $\mathbf{X}$, are presented below. The system’s ability to handle missing data is inherent to the Naive Bayes framework, as the likelihood calculation for a missing feature $x_i$ can be simply omitted (treated as a probability of 1) for that feature, relying on the remaining evidence.

Detection Robustness: MCC under Data缺失 and Environmental Stress
Missing Data Proportion	Lighting Fluctuation (MCC)	Noise Interference (MCC)	Strong EMI (MCC)
0% (Baseline)	0.982	0.978	0.975
5%	0.967	0.960	0.955
10%	0.958	0.945	0.938
15%	0.949	0.932	0.923
20%	0.942	0.925	0.916

The results demonstrate the exceptional stability of the data mining-based detection system. Even under the most strenuous condition (Strong EMI) with a 20%缺失 rate in the input features, the MCC value remains above 0.91, indicating a highly reliable detection capability. The performance under lighting and noise interference is even more robust. This resilience is directly attributed to the probabilistic nature of the Naive Bayes algorithm, which does not require a complete feature set to make a confident prediction. Instead, it leverages the available, reliable evidence from other sensors to infer the state of the intelligent robot. This property is critical for an intelligent robot that must function autonomously in a non-laboratory setting where sensor malfunctions or transient interference are inevitable.

Discussion and Conclusion

The designed and implemented system presents a comprehensive solution for the health monitoring of a library intelligent robot. By integrating a multimodal sensor array with a data mining core based on the Naive Bayes classifier, the system achieves automated, real-time detection of multiple abnormal states. The architecture not only diagnoses faults but also integrates with the intelligent robot‘s control system to initiate safe shutdown procedures, thereby preventing cascading failures.

The key innovation lies in the explicit design for robustness against data incompleteness—a common yet frequently overlooked challenge in real-world robotic applications. The experimental validation under simulated harsh conditions (variable lighting, acoustic noise, and EMI) conclusively proves this robustness. The Naive Bayes algorithm’s performance remains consistently high (MCC > 0.9) even with significant random data缺失, outperforming many more complex models that might overfit or become unstable with incomplete inputs. This makes the system particularly suitable for the long-term, unattended operation of an intelligent robot in public spaces like libraries.

Future work may focus on expanding the system’s capabilities. This could include:

Implementing online incremental learning to allow the Naive Bayes model to adapt slowly to gradual changes in the intelligent robot‘s performance due to aging (concept drift).
Integrating more advanced feature extraction techniques from the time-frequency domain (e.g., using wavelet transforms on vibration data) to detect more subtle incipient faults.
Developing a hierarchical diagnostic model where the Naive Bayes classifier acts as a first-level alarm, triggering more detailed, model-based diagnostics for root cause analysis.

In conclusion, this data mining-based framework provides a reliable, efficient, and practical foundation for ensuring the operational integrity and safety of intelligent robot systems in libraries and similar dynamic human-centric environments.