2025-03-11 · HuggingFace

LeRobot goes to driving school: World’s largest open-source self-driving dataset

research

read at source ↗ huggingface.co

LeRobot goes to driving school: World’s largest open-source self-driving dataset

Source: HuggingFace Date: 2025-03-11 URL: https://huggingface.co/blog/lerobot-goes-to-driving-school

Summary

Dataset release: L2D (Learning to Drive) from Yaak, the world’s largest open-source self-driving dataset. R4 target: 1M episodes, 5,000+ hours, 90+ TB, from 60 EVs across 30 German cities with 6 HD cameras + full vehicle state. Unique features: both expert and student (suboptimal) trajectories, natural language instructions per episode, OpenStreetMap waypoint integration, LLM-powered semantic search. R0 (100 episodes) available now; staggered release to R4. Native LeRobotDataset v3.0 format.

Implications

Thread: open-weights ecosystem health / HF as open-source ML hub. L2D is a genuine landmark in open autonomous driving data — 6x larger than COMMA2K19 at R4, with action labels and language instructions that neither WAYMO nor NuScenes provide. The expert+student trajectory split is particularly valuable for imitation learning research that wants to learn from failure modes, not just expert demonstrations. The closed-loop testing on real vehicles (planned Summer 2025) would make this the only open dataset with verifiable real-world performance feedback. LeRobotDataset format integration means the HF robotics training stack works on AV data out of the box.

← all signals