2025-09-16 · HuggingFace

`LeRobotDataset:v3.0`: Bringing large-scale datasets to `lerobot`

ecosystem

read at source ↗ huggingface.co

LeRobotDataset:v3.0: Bringing large-scale datasets to lerobot

Source: HuggingFace Date: 2025-09-16 URL: https://huggingface.co/blog/lerobot-datasets-v3

Summary

Library update releasing LeRobotDataset v3.0, a new format that packs multiple episodes into single files (Parquet + chunked MP4) rather than one file per episode — enabling million-episode scale datasets and native streaming from HF Hub without local downloads. One-liner migration from v2.1. Available in lerobot-v0.3.x pre-release; stable in v0.4.0.

Implications

Thread: open-weights ecosystem health / HF as open-source ML hub. LeRobot continuing to iterate on its dataset format signals the robotics community is hitting real scale limits — per-episode files don’t survive dataset sizes that rival vision/language corpus scales. Streaming support is directly relevant for labs without petabyte storage. The use of Apache Parquet for tabular data and MP4 chunking for video is sensible prior art from the broader ML data ecosystem being ported to robotics. Watch whether the v3.0 format becomes the standard that other robotics dataset projects converge on, similar to how Parquet became the default for HF text datasets.

← all signals