2025-01-28 · HuggingFace

Open-R1: a fully open reproduction of DeepSeek-R1

modelsresearch

Open-R1: a fully open reproduction of DeepSeek-R1

Source: HuggingFace Date: 2025-01-28 URL: https://huggingface.co/blog/open-r1

Summary

Research initiative announcement: HuggingFace launches Open-R1, a community effort to reproduce DeepSeek-R1’s missing components — training code, reasoning datasets, hyperparameters, and scaling laws. Three-step roadmap: distill reasoning datasets from R1 to create R1-Distill models, replicate the pure RL pipeline (R1-Zero), then demonstrate full multi-stage training (base → SFT → RL). No benchmark numbers at announcement; this is a project kickoff calling for community contributions.

Implications

Open-weights ecosystem health. Open-R1 is the most high-profile example of the “reproduce the missing pieces” pattern that defines how the open-weights community responds to partial open-weight releases. DeepSeek released weights but not training code; within days of the release, HF mobilized a coordinated reproduction effort. This pattern is now a reliable feature of the ecosystem: any partial release will be completed by the community.

Transformers library trajectory. Open-R1 using TRL and the HF training stack as the reproduction substrate means the tooling for reasoning model training is being validated and stress-tested in public. Any gaps or limitations discovered during Open-R1 will drive improvements to TRL and the broader post-training ecosystem.

← all signals