2026-02-19 · Google

Gemini 3.1 Pro: A smarter model for your most complex tasks

pricingmodels

Gemini 3.1 Pro: A smarter model for your most complex tasks

Source: DeepMind Date: 2026-02-19 URL: https://deepmind.google/blog/gemini-3-1-pro-a-smarter-model-for-your-most-complex-tasks/

Summary

Google launched Gemini 3.1 Pro in preview, scoring 77.1% on ARC-AGI-2 — more than double the 3 Pro score on that benchmark. The model targets complex reasoning tasks: multi-API synthesis, animated SVG and 3D code generation, and multi-modal data synthesis. Available to Pro and Ultra subscribers in Gemini app and NotebookLM; pricing not disclosed. Framed as iterative improvement on 3 Pro rather than a new generation.

Implications

77.1% ARC-AGI-2 and “double 3 Pro” is the reasoning improvement story. ARC-AGI-2 tests novel logical reasoning on patterns the model hasn’t seen — it’s specifically designed to resist benchmark contamination. More than doubling performance on that benchmark within a single point release is a genuine reasoning advance, not benchmark optimization. If independently verified, it means 3.1 Pro’s reasoning generalizes to genuinely new problem types, not just trained pattern classes.

Preview release strategy is Google saying: validate this before we commit. “Released in preview to validate these updates” is honest hedging — they know the benchmark improvement is real but want production usage data before GA. That’s appropriate for a 2x reasoning claim. Watch whether the GA release maintains or walks back the ARC-AGI-2 number.

The missing pricing is an unanswered question for enterprise buyers. 3 Pro is priced; 3.1 Pro gets no public pricing in the launch post. That’s unusual. It either means subscriber-only access (no API pricing), or pricing is still being set. Either way, buyers can’t model cost until Google publishes it — which delays enterprise adoption decisions.

Watch:

GA release timeline and API pricing — the preview-only availability limits the addressable market and the independent evaluation surface
ARC-AGI-2 score verification by LMSYS, HELM, and the ARC Prize team — 77.1% is a very specific claim that will be tested
NotebookLM integration quality with 3.1 Pro reasoning — is the improvement perceptible for complex document synthesis tasks?

← all signals