A Comprehensive Review of 3D LiDAR SLAM for Embodied Intelligent Robots in Dynamic Environments

In recent years, the rapid advancement of robot technology has propelled embodied intelligence to new heights, enabling intelligent agents to interact with their surroundings, acquire information, and execute tasks autonomously. Embodied intelligent robots, which integrate artificial intelligence into physical entities, are increasingly deployed in diverse fields such as construction sites, security patrols, emergency response, and industrial automation. A critical enabler for these applications is Simultaneous Localization and Mapping (SLAM), a foundational technology that allows robots to navigate unknown environments by estimating their position and constructing maps in real-time. While traditional SLAM algorithms assume static environments, real-world scenarios are often dynamic, with moving objects like pedestrians, vehicles, and animals introducing challenges such as reduced localization accuracy and map quality due to residual artifacts. This review delves into the research on 3D LiDAR SLAM in dynamic environments, focusing on methods to detect and remove dynamic objects, strategies for handling varying degrees of dynamics, evaluation metrics, datasets, and future directions. I will explore how advancements in robot technology are addressing these challenges, emphasizing the integration of semantic segmentation, ray tracing, and visibility-based approaches to enhance robustness.

The core of this review lies in the methodologies for dynamic object removal in LiDAR point clouds. Dynamic objects can severely compromise the performance of SLAM systems by introducing errors in pose estimation and creating ghosting effects in maps. Based on detection principles, I categorize these methods into three main types: semantic segmentation-based, ray tracing-based, and visibility-based approaches. Semantic segmentation methods leverage clustering and deep learning to identify and eliminate dynamic objects like pedestrians and vehicles. For instance, networks such as FlowNet3D and SalsaNext have been developed to segment point clouds by estimating scene flow or using encoder-decoder structures. However, these methods often rely on labeled datasets and may struggle with generalization. Ray tracing techniques, on the other hand, utilize voxel structures like OctoMap to track laser hits and identify transiently occupied cells, effectively removing dynamic points but at high computational costs. Visibility-based methods exploit the principle that closer points along a laser beam are likely dynamic if they occlude farther static points. Algorithms like Removert and RF-LIO employ multi-resolution range images to distinguish and remove dynamic elements, though they may misclassify ground points or fail in occluded scenarios. Throughout this discussion, I will highlight how robot technology is evolving to integrate these methods into practical SLAM frameworks.

To provide a structured overview, I summarize the key dynamic point cloud removal methods in Table 1, which outlines their innovations and limitations. This table serves as a reference for understanding the trade-offs in computational efficiency, accuracy, and applicability in dynamic environments enabled by advances in robot technology.

Table 1: Summary of Dynamic Point Cloud Removal Methods Based on Semantic Segmentation
Year	Author	Innovation	Limitations
2009	Petrovskaya et al.	2D bounding box modeling	Poor applicability to pedestrians and cyclists
2010	Shackleton et al.	3D grid-based spatial segmentation	Unsuitable for mobile 3D LiDAR sensors
2012	Litomisky et al.	VFH for dynamic cluster separation	Prone to missing outlier dynamic points
2018	Ruchti et al.	Neural network for dynamic probability estimation	Cannot detect untrained objects
2019	Liu et al.	End-to-end scene flow estimation	Lacks integration of motion information
2019	Cortinhal et al.	Uncertainty-aware semantic segmentation	Depends on manually labeled training data
2019	Milioto et al.	Distance image and CNN fusion	Low accuracy for small targets
2020	Zhou et al.	Asymmetric residual blocks and dimension decomposition	Unable to achieve real-time performance
2021	Wang et al.	Spatial attention mechanism for segmentation	Pose deviation in the Z-direction
2022	Kim et al.	Fusion of motion and semantic features	Difficulty detecting small dynamic objects
2022	Mersch et al.	Sparse 4D convolution for spatiotemporal features	Relies on high-quality labeled datasets
2022	Sun et al.	No semantic information required for optimization	Performance drop with additional datasets
2022	Li et al.	Multi-scale interaction network	Cannot integrate multiple temporal information
2024	Han et al.	Polar cylindrical balanced random sampling	Performance degradation with distance

In the context of robot technology, the handling of dynamic objects within SLAM frameworks depends on their dynamic nature. I classify objects into four categories based on mobility: high-dynamic (e.g., moving vehicles), low-dynamic (e.g., temporarily stationary pedestrians), semi-dynamic (e.g., chairs or parked cars), and static objects (e.g., buildings). Correspondingly, SLAM strategies include online real-time processing for high-dynamic objects, offline post-processing for low-dynamic ones, and lifelong SLAM for semi-dynamic objects that change across sessions. Online methods, such as RF-LIO and Dynamic-LIO, use tight coupling with inertial measurement units (IMUs) to remove dynamic points during scan matching, reducing pose drift in real-time. Offline approaches like ERASOR and Removert leverage historical data to refine static maps, albeit with higher latency. Lifelong SLAM, exemplified by frameworks like LT-mapper, continuously updates maps to adapt to environmental changes, ensuring long-term consistency. These strategies underscore the importance of robot technology in enabling autonomous navigation through dynamic scenes.

Evaluation metrics are crucial for assessing the performance of dynamic SLAM algorithms. Common indicators include Absolute Trajectory Error (ATE) and Relative Pose Error (RPE) for localization accuracy, as well as precision, recall, Preservation Rate (PR), and Rejection Rate (RR) for map quality. For example, ATE measures the root mean square error between estimated and ground truth trajectories, calculated as:

$$ATE = \sqrt{\frac{1}{M} \sum_{i=1}^{M} \|\Delta x_i\|^2}$$

where $\Delta x_i = x_i – \Delta R_i \hat{x}_i’$, with $M$ being the number of states, $x_i$ the true pose, $\hat{x}_i’$ the estimated pose, and $\Delta R_i$ the rotation matrix. Similarly, RPE evaluates drift over segments, while PR and RR are defined as:

$$PR = \frac{P_{ss}}{P_{is}} \times 100\%$$

$$RR = \left(1 – \frac{P_{id}}{P_{sd}}\right) \times 100\%$$

where $P_{ss}$ is the number of static points preserved, $P_{is}$ is the initial static points, $P_{id}$ is the dynamic points not removed, and $P_{sd}$ is the total dynamic points. These formulas highlight the quantitative aspects of robot technology in dynamic SLAM validation.

Datasets play a vital role in benchmarking dynamic SLAM algorithms. I summarize commonly used datasets in Table 2, which include KITTI, Semantic-KITTI, NCLT, and others, providing diverse scenarios for testing robot technology in dynamic environments. These datasets offer labeled point clouds and trajectories, facilitating the development of robust SLAM systems.

Table 2: Commonly Used Datasets for Dynamic SLAM Evaluation
Name	Year	Scene	Characteristics
KITTI	2012	Outdoor	Multi-traffic environment for robot performance assessment
NCLT	2016	Indoor + Outdoor	Dynamic objects and long-term changes in complex settings
Semantic-KITTI	2019	Outdoor	Rich environmental context with semantic labels
UrbanLoco	2020	Outdoor	Large-scale urban localization in dense scenes
UrbanNav	2021	Outdoor	Precise positioning with low-cost sensors in urban canyons
DOALS	2021	Indoor	Dynamic pedestrian changes and related objects
Dynablox	2023	Indoor + Outdoor	Challenging elements like dynamic objects and weather variations
Flatbed	2024	Indoor + Outdoor	Multi-sensor data including LiDAR and cameras

Looking ahead, the future of dynamic LiDAR SLAM in robot technology is poised for significant advancements through deep learning integration, multi-sensor fusion, and lightweight, scalable designs. Deep learning methods, such as 3D object detection networks, can improve the accuracy of dynamic object removal by directly processing point clouds before registration, reducing interference in high-dynamic environments like construction sites. However, challenges remain in detecting small targets, such as workers, which require enhanced feature extraction techniques. Multi-sensor fusion, combining LiDAR with cameras and IMUs, will enhance robustness in complex terrains. For instance, fusing visual data with LiDAR point clouds can provide texture information and improve loop closure, addressing limitations of single-sensor systems. Lightweight algorithms are essential for resource-constrained platforms, focusing on efficient memory usage and real-time performance. Moreover, lifelong SLAM approaches must evolve to handle multi-session mapping, enabling robots to adapt to long-term environmental changes. These trends emphasize the role of robot technology in pushing the boundaries of autonomous navigation.

In conclusion, dynamic LiDAR SLAM is a critical component of embodied intelligent robots, enabling precise localization and mapping in real-world dynamic settings. Through methods like semantic segmentation, ray tracing, and visibility-based removal, along with strategies tailored to object dynamics, robot technology continues to overcome challenges such as residual artifacts and pose errors. Evaluation metrics and datasets provide the foundation for benchmarking, while future directions point toward deeper learning integration, sensor fusion, and scalability. As robot technology advances, these innovations will empower robots to operate autonomously in increasingly complex and dynamic environments, driving progress across various applications. This review underscores the importance of ongoing research to refine SLAM algorithms, ensuring they meet the demands of modern robotics.