2025-02-10 · HuggingFace

Open R1: Update #2

modelsresearch

read at source ↗ huggingface.co

Open R1: Update #2

Source: HuggingFace Date: 2025-02-10 URL: https://huggingface.co/blog/open-r1/update-2

Summary

Research/dataset release: Open R1 Update #2 ships OpenR1-Math-220k (220k filtered math reasoning traces from 800k generated at scale on 512 H100s, 180k traces/day via SGLang). OpenR1-Qwen-7B, trained on this data, matches DeepSeek-Distill-Qwen-7B on MATH-500 (90.6 vs 91.6) and AIME25 (both 40). Key insight: rule-based + LLM-based filtering (Llama-3.3-70B recovered 28k additional correct traces) yields 55% problem coverage from 400k raw problems.

Implications

Thread: open-weights ecosystem health / model release cadence. Open R1 demonstrating that a 7B model trained on openly-generated data can match DeepSeek’s proprietary distillation is the key result — it validates the “open pipeline as equal alternative” thesis. SGLang at 2x vLLM throughput for data generation is an operational finding relevant for any team running large-scale synthetic data pipelines. The community findings (LIMO at 817 samples achieving strong math performance) challenge the assumption that bigger datasets always win. Watch whether Open R1’s open pipeline becomes the standard for reproducing proprietary reasoning training recipes.

← all signals