2025-04-17 · Google

Introducing Gemini 2.5 Flash

pricingmodelsinfrastructure

Introducing Gemini 2.5 Flash

Source: DeepMind Date: 2025-04-17 URL: https://deepmind.google/blog/introducing-gemini-2-5-flash/

Summary

Google launched Gemini 2.5 Flash as its “first fully hybrid reasoning model,” with controllable thinking budgets (0–24,576 tokens) that let developers tune quality vs. latency vs. cost. Even with thinking disabled, 2.5 Flash outperforms 2.0 Flash. Positioned second only to 2.5 Pro on LMArena Hard Prompts. Claimed to be the “most cost-efficient thinking model” by price-to-performance. Preview in AI Studio, Vertex AI, and Gemini API.

Implications

Controllable thinking budgets are the integrator-facing product. Setting thinking budget from 0 to 24,576 tokens per call is the API surface that separates this from prior thinking models where thinking was binary on/off. That’s a real engineering interface: production systems can set thinking depth based on task type, routing cheap queries to 0-budget and complex queries to full budget.

“Hybrid reasoning” as the 2.5 positioning. Flash being “hybrid” — capable of both direct response and extended thinking in one model — collapses the distinction between a fast model and a reasoning model. That’s the right architecture for production agentic systems that need to switch between quick responses and deliberate reasoning mid-task.

Cost-efficient thinking model claim needs verification. The pricing claims vs. o3-mini and Claude 3.5 Sonnet (contemporaneous thinking models) are asserted but not independently validated at this preview stage. The preview gate means production cost benchmarks aren’t available yet.

Watch:

Whether thinking budget API patterns stabilize in production — do developers find a “sweet spot” budget for common task classes?
Flash 2.5 vs. Claude 3.5 Haiku thinking mode in real-world latency and quality evaluations by integrators
GA release timing and pricing structure for Vertex AI enterprise customers

← all signals