2025-03-11 · HuggingFace

Open R1: Update #3

modelsresearch

read at source ↗ huggingface.co

Open R1: Update #3

Source: HuggingFace Date: 2025-03-11 URL: https://huggingface.co/blog/open-r1/update-3

Summary

Model and dataset release: Open R1 Update 3 ships OlympicCoder-7B and 32B (Qwen2.5 Coder base, fine-tuned on 100K CodeForces-CoTs), outperforming Claude 3.7 Sonnet on IOI problems. Ships IOI 2020–2024 benchmark with full test cases and evaluation code. Five training findings: sample packing hurts reasoning (long CoT gets clipped), lr=4e-5 optimal (+10pts/doubling on LiveCodeBench), editorials don’t help, <think> prefill required for consistent CoT, 8-bit optimizer + FSDP unlocks 22K context for 32B.

Implications

Thread: open-weights ecosystem health. The training lessons are the lasting value here: sample packing degrading reasoning performance is non-obvious and likely affected many GRPO fine-tuning attempts. The <think> prefill requirement (without it, out-of-domain queries revert to short responses) is a practical deployment detail that many teams will have missed. Learning rate 4e-5 vs. the typical 2e-5 yielding +10 points is a large enough gain that it will propagate into community recipes quickly. OlympicCoder-32B competitive with frontier models on IOI continues the pattern of open-weight models closing the gap on specialized reasoning tasks at a fraction of the parameter count.

← all signals