2026-03-03 · Google

Gemini 3.1 Flash-Lite: Built for intelligence at scale

modelsinfrastructure

Gemini 3.1 Flash-Lite: Built for intelligence at scale

Source: DeepMind Date: 2026-03-03 URL: https://deepmind.google/blog/gemini-3-1-flash-lite-built-for-intelligence-at-scale/

Summary

Google launched Gemini 3.1 Flash-Lite at $0.25/$1.50 per million tokens input/output, with 2.5x faster TTFT and 45% higher output throughput than its predecessor. The model scores Arena.ai ELO 1432, GPQA Diamond 86.9%, and MMMU-Pro 76.8%, outperforming larger prior-generation models including Gemini 2.5 Flash. Early adopters include Latitude (narrative games) and Cartwheel (3D animation).

Implications

Outperforming Gemini 2.5 Flash while costing less is the commodity intelligence signal. When a newer, cheaper, faster model beats the prior tier’s flagship on academic benchmarks, the prior tier’s pricing becomes indefensible. Flash-Lite 3.1 is Google saying the intelligence floor has risen faster than the cost floor — and that the gap between “good enough” and “frontier” is narrowing in the low-cost tier.

GPQA Diamond 86.9% from a cost-optimized model is the benchmark that matters here. GPQA Diamond is PhD-level science reasoning — not trivia. A score that high from a Flash-class model challenges the assumption that reasoning benchmarks require Pro-tier models and pricing. The implication for buyers: evaluate on GPQA first, then reach for Pro only if Flash-Lite fails.

2.5x faster TTFT is the latency story that matters for interactive applications. Voice, real-time translation, and agentic chains are latency-bound, not throughput-bound. TTFT directly affects perceived responsiveness. Flash-Lite 3.1 is positioning against OpenAI’s GPT-4o mini and Anthropic’s Haiku on both price and latency, not just benchmark scores.

Watch:

How Latitude’s narrative game quality changes with Flash-Lite 3.1 vs. its prior model — real-world quality signal from a latency-sensitive, high-volume consumer application
Whether GPQA Diamond 86.9% holds on independent evaluation (HELM, LMSYS) or reflects benchmark-specific tuning
Flash-Lite 3.1 adoption rate relative to Flash-Lite 2.5 — the speed of migration indicates how much enterprises trust the quality claim

← all signals