2025-08-14 · HuggingFace

Kimina-Prover-RL

modelsresearchinfrastructure

Kimina-Prover-RL

Source: HuggingFace Date: 2025-08-14 URL: https://huggingface.co/blog/AI-MO/kimina-prover-rl

Summary

Model release + training framework: Kimina-Prover-RL releases two open-source formal theorem proving models (1.7B and 0.6B) trained with GRPO via Verl on Lean 4. The 1.7B model achieves 76.23% Pass@32 on MiniF2F (vs. 72.95% previous baseline), improving to 77.87% with error correction. Training takes 48 hours on 8xH100. Key components: structured reasoning-then-generation format (one <think> block + one Lean code block), format reward system, DrGRPO for length-bias correction, and Kimina-Lean-Server for high-throughput parallel proof verification. Full pipeline open-sourced as a Verl fork.

Implications

Open-weights ecosystem health. 76%+ Pass@32 at 1.7B parameters on MiniF2F makes formal math proving accessible at a scale that can run on a single high-end GPU. The error correction turn (Lean feedback → retry) is a simple but effective agentic loop that non-expert teams can now replicate using the published code.

Model release cadence — math/reasoning thread. Kimina-Prover-RL exemplifies the post-DeepSeek-R1 pattern: take GRPO, apply it to a verifiable domain (Lean proofs have formal correctness as a ground truth reward), and release the full recipe. This pattern will continue to produce domain-specific reasoning model releases wherever there is a deterministic verifier — code execution, math solvers, formal verification systems.

← all signals