In the context of evolving modern societies and changing family structures, many families face significant pressures from demanding work and life schedules. This often results in children lacking adequate companionship and opportunities for parent-child interaction. Children have diverse and individualized learning and emotional needs, which traditional educational resources and family time frequently struggle to meet fully. To address this need for personalization, a companion robot must be capable of adapting its interaction and educational modes. With the advancement of technology, mobile Internet of Things (IoT) has become deeply integrated into various aspects of daily life, enabling smarter management and convenience. A child companion robot, leveraging technologies such as voice recognition, emotion detection, and interactive design, can communicate and interact with children, providing functions like companionship, education, and entertainment. It assists in enhancing a child’s learning, emotional, and social skills, helping to fill the companionship gap in real life. Such a companion robot can serve as an excellent companion, listener, and educator for children when parents are occupied, holding significant importance for child development and familial emotional bonds.
This article details the design and development of an intelligent child companion robot system based on mobile IoT technology. The system is supported by IoT frameworks, utilizing stable communication methods to achieve “interconnection and intercommunication” among all functional modules within the robot system. It incorporates features such as interactive play, voice conversation, nursery rhyme playback, picture book reading, and knowledge learning. Through engaging and fun interactions, it aims to deliver knowledge and guide learning. The primary goal is to mitigate the companionship and education challenges arising from parents’ busy work lives, thereby promoting the holistic and healthy growth of children.
System Architecture of the Companion Robot
The designed intelligent child companion robot system consists of two main components: the companion robot terminal (the physical robot) and a mobile management platform (a smartphone application). The system architecture is illustrated in the following figure, showcasing the integration of hardware and software components.

The companion robot terminal uses an STM32 microcontroller as its main controller. For motion control, an OpenMV vision module works in tandem with motor drivers to operate two 5V DC motors, enabling autonomous line-following. The voice interaction system is implemented using an LD3320 speech recognition module and a SYN6288 speech synthesis module to facilitate natural dialogue. The启蒙 learning scenarios (nursery rhymes, stories) are handled by a DFPlayer Mini audio module. An ESP8266 Wi-Fi module establishes a Socket network connection between the robot terminal and the mobile management platform, enabling data exchange across these heterogeneous platforms. The mobile platform allows for wireless control over the robot’s运动 state and its educational resource playback functions.
The mobile management platform is developed on the Android Studio IDE. It encompasses several functional modules: User Registration & Login, Robot Control,启蒙 Education Scene Learning, and System Settings. After successful registration, users log in with credentials to access the system. The Robot Control interface establishes a Socket connection with the robot terminal, allowing the user to command the robot’s movement (forward, backward, turn), control nursery rhyme playback, adjust volume, and select operational modes. The启蒙 Education module provides resources like picture books, nursery rhymes, and general knowledge facts to aid in developing reading habits and improving knowledge literacy.
Hardware Design of the Companion Robot Terminal
The hardware structure of the companion robot terminal is built around the STM32 microcontroller, which orchestrates all peripheral modules. A boost converter steps up the input 5V to 12V to power the L298N motor driver, which in turn drives the two 5V DC motors for locomotion.
In autonomous line-following mode, the OpenMV camera captures images of the predefined path (e.g., a 2cm wide black line). The image is processed by the OpenMV, and the resulting guidance data is sent to the STM32, which controls the motors accordingly, requiring no human intervention.
For wireless control mode, the ESP8266 module creates a Wi-Fi access point. The mobile app (client) connects to this AP, forming a Client/Server (C/S) network. Control commands from the app are sent via Socket communication to the ESP8266, which relays them to the STM32 for execution.
The audio subsystem uses the DFPlayer Mini module, which communicates with the STM32 via UART serial communication. It reads MP3/WAV files stored on a microSD card, organized into folders (e.g., 01 for nursery rhymes, 02 for picture books). The module receives 10-byte serial commands from the microcontroller to perform actions like play, pause, next track, and volume control.
The voice interaction subsystem comprises two key chips:
- LD3320 Speech Recognition: This chip employs non-specific person speech recognition technology. It receives audio via its built-in microphone, compares the input against a pre-loaded list of keyword phonemes internally, and outputs the recognition result to the STM32 via an SPI interface.
- SYN6288 Speech Synthesis: This module receives text data or control commands from the STM32 via UART and converts it into audible speech, which is then amplified and played through a speaker.
A visual feedback module consisting of multi-colored LEDs is integrated into the robot’s外壳. These LEDs illuminate upon successful Wi-Fi configuration, indicating a ready state, and can blink in response to received commands, enhancing the interactive experience. The primary hardware components are summarized in the table below.
| Hardware Module | Key Component / IC | Primary Function |
|---|---|---|
| Main Controller | STM32 Microcontroller | Central processing unit; coordinates all modules. |
| Motion & Vision | OpenMV Camera Module | Captures and processes visual data for line-following. |
| Motor Drive | L298N Driver + 5V DC Motors | Provides locomotion for the robot. |
| Wireless Communication | ESP8266 Wi-Fi Module | Establishes network for mobile app control (Socket). |
| Speech Recognition | LD3320 Chip | Recognizes spoken keywords from the child. |
| Speech Synthesis | SYN6288 Chip | Generates spoken responses from text. |
| Audio Playback | DFPlayer Mini Module | Plays stored audio files (songs, stories). |
| Power Management | Boost Converter Circuit | Converts 5V to 12V for motor driver. |
| User Feedback | Multi-color LEDs | Provides visual status and interaction cues. |
Software Design and Algorithm Implementation
1. Motion Control Functions
The companion robot operates in two distinct motion modes: Wireless Remote Control and Autonomous Line-Following.
A. Wireless Remote Control Mode:
The mobile app acts as a client, and the robot terminal as a server. The control algorithm on the app is straightforward. A circular control pad (D-pad) is divided into four directional sectors and a central “STOP” button. Each sector is mapped to a specific character command:
- ‘G’: Forward
- ‘B’: Backward
- ‘L’: Turn Left (Counter-clockwise)
- ‘R’: Turn Right (Clockwise)
- ‘S’: Stop
When a user touches a sector, the corresponding character is placed into a data array. This array is then written to the Socket output stream and transmitted over Wi-Fi to the ESP8266 on the robot. The STM32解析 the incoming data and controls the L298N driver accordingly. For example, upon receiving ‘G’, it sets both motors to spin forward.
B. Autonomous Line-Following Mode:
This mode allows the companion robot to navigate a predefined black track without user input. The algorithm implemented on the OpenMV and STM32 involves several steps:
- Image Acquisition & Preprocessing: The OpenMV captures frames of the ground. Each frame is converted to grayscale, filtered to reduce noise, and undergoes thresholding to create a binary image where the black track is white (high pixel value) and the background is black.
- Region of Interest (ROI) Segmentation: The processed image is divided into five distinct zones for robust tracking: three horizontal strips in the middle and two at the top, as shown conceptually below. This layout helps in handling intersections.
- Line Detection and Center Calculation: Within each ROI, the algorithm scans for the largest contiguous blob of white pixels (the track). It finds the left and right edges of this blob and calculates its center coordinate (x, y) for that region. A line is then fitted through the center points of the detected blobs across ROIs to represent the target path. The center of this fitted line, $X_{line}$, is computed.
- Deviation Calculation and Steering Correction: The key to steering is calculating the deviation between the center of the camera’s field of view and the center of the detected track. The horizontal deviation $\Delta d_x$ is given by:
$$ \Delta d_x = R_c \cdot (X_0 – X_{line}) $$
where $X_0$ is the horizontal center coordinate of the camera’s preprocessed image frame, and $R_c$ is a weighting factor assigned to the target ROI to prioritize certain zones (e.g., the central zone might have a higher weight for stability). - Motor Control Logic: The sign and magnitude of $\Delta d_x$ determine the corrective action. If $\Delta d_x$ is positive and large, the robot steers right; if negative and large, it steers left. If the magnitude is small, the robot moves forward. This error signal is fed into a proportional control logic on the STM32 to adjust the PWM signals to the left and right motors differentially.
- Intersection and Terminal Detection: For a cross or T intersection, the detection of significant white blobs in specific adjacent ROIs triggers a predefined turning routine. The finish line is defined by a wider pattern (e.g., 4cm wide black dashes). The algorithm counts the number of black blobs detected in the central and upper ROIs. If the count exceeds a set threshold $N_{threshold}$, it判定 the robot has reached the destination.
$$ \text{If } (Count_{blobs} > N_{threshold}) \rightarrow \text{Terminate motion.} $$
2. Voice Interaction Function
The voice interaction for the companion robot is designed to be robust against accidental activation in noisy environments. It employs a two-tier keyword system:
- Primary Keyword (Wake-up): The system constantly listens for a specific wake-up word, e.g., “Kamil” (卡缪). Only when this is correctly recognized by the LD3320 module does the robot activate its full listening mode. The STM32 then commands the SYN6288 to respond with a confirmation like “I’m here, master. Please give your command.”
- Secondary Keywords (Commands): After the wake-up, the system listens for a set of secondary command keywords (e.g., “Tell me a story,” “What’s the time?”, “Play a song”). Upon recognizing one, the robot executes the associated action or provides a synthesized verbal reply.
- Exit Command: A specific keyword like “Goodbye” makes the robot say a farewell phrase and reverts its listening state back to waiting for the primary wake-up word only.
This design minimizes false positives, as casual background noise is unlikely to contain both the precise primary and a valid secondary keyword in sequence, making the companion robot‘s interaction more reliable and natural for the child.
3.启蒙 Learning Interaction
The启蒙 learning function is managed through the DFPlayer Mini module. Audio files are categorized and stored in numbered folders on the microSD card (01: Nursery Rhymes, 02: Picture Books, 03: Knowledge Encyclopedia). Control is achieved via structured 10-byte UART commands from the STM32. A typical command frame structure is:
7E FF 06 0F 00 [Param1] [Param2] FE [Checksum_H] [Checksum_L] EF
Where:
Param1: Specifies the folder number (01, 02, 03).Param2: Specifies the track number within that folder (e.g., 01 for the first song).
For instance, the command 7E FF 06 0F 00 01 01 FE E8 EF instructs the module to play track 01 from folder 01 (first nursery rhyme). Commands from the mobile app are translated into these serial codes by the STM32, allowing seamless playback control of diverse educational content, making the companion robot a versatile learning aid.
Mobile Management Platform Development
The Android application serves as the primary user interface for parents and children to interact with the companion robot. Its development followed a modular architecture.
1. User Authentication Module: Implements a secure login and registration system using local SQLite database storage for user credentials. Features include password visibility toggle and input validation.
2. Robot Control Module – The D-pad Interface: The core control interface is a custom-view circular D-pad. The drawing algorithm involves:
- Calculating the center point $(C_x, C_y)$ of the available screen area.
- Drawing a large outer ring divided into four 90-degree arcs (sectors) corresponding to Forward, Backward, Left, and Right.
- Drawing a smaller central circle labeled “STOP”.
- Implementing an
OnTouchListenerto determine the touch point $(T_x, T_y)$. - Calculating the distance $d$ from the touch point to the center:
$$ d = \sqrt{(T_x – C_x)^2 + (T_y – C_y)^2} $$ - If $d$ is less than the central circle’s radius, the “STOP” command (‘S’) is triggered.
- If $d$ is greater, the angle $\theta$ of the touch point relative to the center is calculated:
$$ \theta = \arctan2(T_y – C_y, T_x – C_x) $$
This angle, adjusted for the screen coordinate system, is used to determine which directional sector was pressed and send the corresponding character (‘G’,’B’,’L’,’R’) via the Socket output stream.
3.启蒙 Learning Module: This module has two main parts:
- Picture Book Database: Uses SQLite to manage a collection of daily reading materials. A table “book” is created with columns:
id(INTEGER PRIMARY KEY),author(TEXT),name(TEXT),readtime(REAL). Users can browse, search, and select books. A query likeSELECT * FROM book;fetches all entries for display. - Audio Player Interface: A separate screen mimics a standard media player with circular buttons for Play/Pause, Previous, Next, and a mode switcher to toggle between “Nursery Rhymes” and “Children’s Stories”. This interface sends high-level commands to the robot, which are then executed via the DFPlayer module.
System Integration and Testing
Comprehensive testing was conducted to validate the functionality and performance of the companion robot system.
1. Motion Function Test
A. Wireless Control Test: The robot’s Wi-Fi AP was activated (indicated by LEDs). The mobile app connected to the IP 192.168.4.1:8080. Each directional command on the D-pad was pressed sequentially. The robot’s response was observed and the command-to-action latency was measured. The results from a sample test run are tabulated below, showing consistent low-latency response, which is crucial for a responsive companion robot.
| Command Sequence | Action | Observed Response Time (s) |
|---|---|---|
| 1 | Forward (‘G’) | 1.0 |
| 2 | Left Turn (‘L’) | 1.0 |
| 3 | Right Turn (‘R’) | 1.0 |
| 4 | Backward (‘B’) | 2.0 |
| 5 | Stop (‘S’) | 2.0 |
B. Autonomous Line-Following Test: The robot was placed on a track with straight lines, crosses, and a dashed finish line. The OpenMV algorithm successfully calculated deviations. For instance, in a straight run, measured values were $X_{line}=156$, $X_0=160$, $R_c=0.91$, yielding $\Delta d_x = 0.91 \times (160-156) = 3.64$, a small error leading to minor steering correction. At a cross intersection, $X_{line}$ was found to be 160 (equal to $X_0$), correctly triggering the turn routine. At the finish line with 10 dashes against a threshold $N_{threshold}=6$, the condition $10 > 6$ was true, causing the robot to stop successfully.
2. Voice Interaction and启蒙 Learning Test
A. Voice Interaction: The primary keyword “Kamil” was spoken, and the robot reliably responded with its wake-up phrase. Subsequent secondary commands like “What is your name?” were correctly recognized and answered via speech synthesis, as verified by monitoring the STM32’s debug UART output connected to a PC serial terminal.
B.启蒙 Learning: Commands sent from the mobile app’s learning module were received by the robot. The STM32 parsed these commands and sent the appropriate 10-byte sequences to the DFPlayer Mini. The module correctly played the requested audio files from the designated folders (nursery rhymes, picture book narrations, knowledge facts), confirming the full integration of the educational pipeline within the companion robot.
Conclusion
This project successfully designed and developed a functional prototype of an intelligent child companion robot based on mobile IoT technology. The system integrates appealing外观 design with multi-modal interaction through sound and light. The implementation of a two-tier keyword system for voice recognition significantly improves accuracy in everyday environments. The C/S network architecture using Socket communication over Wi-Fi provides a stable and responsive channel for mobile app control. The autonomous navigation capability, powered by computer vision algorithms for line-following and terminal detection, adds an element of independent operation. Furthermore, the structured audio system offers a rich set of启蒙 learning resources, controllable intuitively via the mobile platform. This companion robot addresses a genuine social need by providing educational support and emotional companionship to children during times when parental attention is limited. As IoT and AI technologies continue to evolve, the intelligence and interactivity of such robotic companions will only deepen, holding great potential to positively impact child development and family dynamics.
