2025-12-04 · HuggingFace

DeepMath: A lightweight math reasoning Agent with smolagents

agentsmodelsinfrastructure

DeepMath: A lightweight math reasoning Agent with smolagents

Source: HuggingFace Date: 2025-12-04 URL: https://huggingface.co/blog/intel-deepmath

Summary

Model release from Intel: DeepMath (4B), a math reasoning agent built on Qwen3-4B Thinking + smolagents, fine-tuned with GRPO. Key approach: generates concise Python code snippets for execution rather than verbose text reasoning, folding results back into the trace. Achieves up to 66% shorter output traces while maintaining or improving accuracy on MATH500, AIME2025, HMMT, HLE. Critical finding: GRPO training on the agent framework yields both shorter outputs AND better accuracy — agent alone shows mixed results.

Implications

Thread: open-weights ecosystem health / agentic patterns. DeepMath demonstrates that GRPO + code-first agentic reasoning is a viable training approach for math agents, not just a deployment pattern. The 66% trace reduction matters for inference cost and latency at scale. The “agent alone = mixed; agent + GRPO = improved” finding is important: tool-augmented inference needs training-time reward signal to realize its potential, it’s not a free upgrade over base reasoning. The smolagents + vLLM + TRL GRPO combination is a reference open-source stack for training code-act reasoning agents.

← all signals