DeepMath: A lightweight math reasoning Agent with smolagents
read at source ↗ huggingface.co
DeepMath: A lightweight math reasoning Agent with smolagents
Source: HuggingFace Date: 2025-12-04 URL: https://huggingface.co/blog/intel-deepmath
Summary
Model release from Intel: DeepMath (4B), a math reasoning agent built on Qwen3-4B Thinking + smolagents, fine-tuned with GRPO. Key approach: generates concise Python code snippets for execution rather than verbose text reasoning, folding results back into the trace. Achieves up to 66% shorter output traces while maintaining or improving accuracy on MATH500, AIME2025, HMMT, HLE. Critical finding: GRPO training on the agent framework yields both shorter outputs AND better accuracy — agent alone shows mixed results.
Implications
Thread: open-weights ecosystem health / agentic patterns. DeepMath demonstrates that GRPO + code-first agentic reasoning is a viable training approach for math agents, not just a deployment pattern. The 66% trace reduction matters for inference cost and latency at scale. The “agent alone = mixed; agent + GRPO = improved” finding is important: tool-augmented inference needs training-time reward signal to realize its potential, it’s not a free upgrade over base reasoning. The smolagents + vLLM + TRL GRPO combination is a reference open-source stack for training code-act reasoning agents.