Introducing OpenAI o3 and o4-mini
read at source ↗ openai.com
Introducing OpenAI o3 and o4-mini
Source: OpenAI Date: 2025-04-16 URL: https://openai.com/index/introducing-o3-and-o4-mini
Summary
OpenAI’s April 2025 launch of o3 and o4-mini, the third generation of the reasoning model series. o3 was positioned as the most capable reasoning model available, with substantial improvements on math, science, and coding benchmarks over o1 and o3-mini. o4-mini brought strong reasoning performance at significantly lower cost and latency, targeting the high-volume use cases where o3’s inference costs were prohibitive. Both models integrated tool use (code interpreter, web browsing, file analysis) natively within their reasoning chains.
Implications
Tool-integrated reasoning. The o3/o4-mini launch was the first time OpenAI’s reasoning models could call tools mid-chain — searching the web, running code, reading documents — as part of the extended thinking process rather than as separate API calls. This is architecturally significant: it makes the reasoning chain an agent loop, not just extended deliberation.
The o4-mini efficiency play. o4-mini achieving reasoning model quality at mini prices was the key competitive move. Anthropic’s Claude 3.7 Sonnet and Google’s Gemini 2.0 Flash Thinking were competing in the same cost-performance tier. The race to make “good reasoning” cheap is happening on a faster timeline than expected.
Benchmark ceiling pressure. o3 achieved near-perfect scores on competition math benchmarks (AIME 2024/2025) and very high scores on GPQA Diamond, which contributed to OpenAI retiring SWE-bench verified as a meaningful differentiator (February 2026). The o3/o4-mini launch accelerated benchmark saturation that GPT-5.5 was designed to address.
Watch: Whether native tool use in reasoning chains enables qualitatively new agent behaviors vs. the prior tool-call-then-reason pattern, and how latency profiles affect real-world agent deployments.