March 11, 2025

Hugging Face Releases L2D: World's Largest Multimodal Driving Dataset

Listen to this article as Podcast
0:00 / 0:00
Hugging Face Releases L2D: World's Largest Multimodal Driving Dataset

LeRobot and Driving School: A Massive Dataset for Autonomous Driving

The development of Artificial Intelligence (AI) in the field of autonomous driving requires large, high-quality datasets. While significant progress has been made in image and language models through open-source datasets, the field of robotics and autonomous driving lags behind. To close this gap, Yaak, in collaboration with the LeRobot team at Hugging Face, has released the "Learning to Drive" (L2D) dataset. L2D is the world's largest multimodal dataset of its kind and aims to advance the development of spatial intelligence in the automotive sector, particularly by supporting the LeRobot training pipeline and models.

Compared to existing datasets like WAYMO, NuScenes, MAN, ZOD, and COMMA, which sometimes exclude Lidar and radar data, L2D stands out due to its size and the inclusion of driving instructions. The dataset comprises over 1 million episodes with a total duration of over 5000 hours and a volume of over 90 terabytes. It was collected over three years using identical sensors in 60 driving school cars across 30 German cities.

A special feature of L2D is the distinction between expert and student data. Expert data comes from driving instructors with many years of experience, while student data was recorded from learner drivers with varying levels of proficiency. Both groups cover all driving scenarios required to obtain a driver's license in the EU (German version), such as overtaking, roundabouts, and railroad crossings. While the expert drives are error-free, the student drives contain typical driving errors. In future versions of the dataset (from R3 onwards), the student data will be further enriched with natural language descriptions of the driving errors.

Data acquisition was performed using six RGB cameras for a 360° view, GPS for positioning, an IMU for vehicle dynamics, and data from the vehicle's CAN interface, such as speed, accelerator/brake pedal position, steering angle, turn signals, and gear. All data was synchronized with the front left camera and reduced to a sampling rate of 10 Hz. Where necessary, data points were interpolated to increase accuracy.

L2D follows the official German driving task catalog and includes a unique task ID and natural language instructions for each episode. The LeRobot task for all episodes is: "Follow the waypoints while observing traffic rules and regulations." The instructions were automatically generated using the vehicle position (GPS), Open-Source Routing Machine, OpenStreetMap, and a Large Language Model (LLM) and resemble the instructions of common navigation devices. The waypoints are calculated by map matching the GPS data with the OSM graph and serve as landmarks.

To facilitate the search for relevant episodes, the GPS data was enriched with information from OpenStreetMap, such as turn information, route characteristics, and restrictions. This semantic enrichment enables a multimodal search for episodes based on natural language descriptions of the driving task. An LLM is used to translate the natural language instructions into route tasks and find matching episodes.

L2D is provided in the LeRobotDataset v2.1 format to ensure compatibility with current and future LeRobot models. The dataset is intended to promote the development of end-to-end learning models for autonomous driving that can predict actions directly from sensor data. In contrast to existing datasets that focus on intermediate tasks such as object recognition and motion planning, L2D enables the training of models based on pre-trained image and language models.

The release of L2D is happening in phases. With each new version, additional information is added to the episodes, such as natural language instructions, task IDs, route information, and descriptions of suboptimal driving maneuvers.

To encourage the growth of L2D beyond the planned R4 version, the AI community is invited to participate in the search for new scenarios and the expansion of the dataset. Starting in summer 2025, it will also be possible to test AI models trained with L2D in real vehicles with a safety driver.

Bibliography: https://huggingface.co/datasets/yaak-ai/lerobot-driving-school/viewer/default/train https://github.com/huggingface/lerobot https://waymo.com/open/ https://github.com/mjyc/awesome-robotics-projects https://public.roboflow.com/object-detection/self-driving-car https://arxiv.org/html/2404.06229v1 https://www.linkedin.com/in/jadechoghari https://huggingface.co/lerobot https://interestingengineering.com/innovation/you-can-now-download-the-worlds-largest-self-driving-dataset https://www.infoq.com/news/2024/05/lerobot-huggingface-robotics/