Driving systems often rely on high-definition (HD) maps for precise environmental information, which is crucial for planning and navigation. While current HD map constructors perform well under ideal conditions, their resilience to real-world challenges, e.g., adverse weather and sensor failures, is not well understood, raising safety concerns. This work introduces MapBench, the first comprehensive benchmark designed to evaluate the robustness of HD map construction methods against various sensor corruptions. Our benchmark encompasses a total of 29 types of corruptions that occur from cameras and LiDAR sensors. Extensive evaluations across 31 HD map constructors reveal significant performance degradation of existing methods under adverse weather conditions and sensor failures, underscoring critical safety concerns. We identify effective strategies for enhancing robustness, including innovative approaches that leverage multi-modal fusion, advanced data augmentation, and architectural techniques. These insights provide a pathway for developing more reliable HD map construction methods, which are essential for the advancement of autonomous driving technology. The benchmark toolkit and affiliated code and model checkpoints have been made publicly accessible.
The definitions of the Camera and LiDAR sensor corruptions in the proposed MapBench. Our benchmark encompasses a total of 16 corruption types for HD map construction, which can be categorized into exterior, interior, and sensor failure scenarios. Besides, we define 13 multi-sensor corruptions by combining the camera and LiDAR sensor failure types.
Complete list of 31 HD map constructors evaluated in MapBench. Basic information of different models includes input modality, BEV Encoder, backbone, training epoch, and performance on the clean nuScenes validation set. “L” and “C” represent LiDAR and camera, respectively. “Effi-B0”, “R50”, “PP”, and “Sec” are short for EfficientNet-B0, ResNet50, PointPillars, and SECOND, respectively. † denotes the result is reproduced with the released model. ‡ means that we modify the public code and obtain results with the model trained by ourselves. ped., div., and bou. are short for pedestrian-crossing, divider, and boundary.
Qualitative assessment of the state-of-the-art camera-LiDAR fusion-based HD map construction method under the Camera and LiDAR combined sensor corruptions.
The map construction accuracy of state-of-the-art HD map constructors under each of the three severity levels (Esay, Moderate, and Hard) in different Camera and LiDAR sensor corruption scenarios.
Benchmarking HD map constructors. Methods are split into groups based on input modality, BEV encoder, backbone, and training epochs. “L” and “C” represent LiDAR and camera, respectively. “Effi-B0”, “R50”, “PP”, and “SEC” are short for EfficientNet-B0, ResNet50, PointPillars, and SECOND. AP denotes performance on the clean nuScenes val set. The subscripts b., p., and d. are short for the boundary, pedestrian crossing, and divider, respectively.
The results of the MapTR model under different model configurations and multi-sensor corruption types.
The datasets and benchmarks are under the Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License