2025-10-25 · Google

T5Gemma: A new collection of encoder-decoder Gemma models

models

T5Gemma: A new collection of encoder-decoder Gemma models

Source: DeepMind Date: 2025-10-25 URL: https://deepmind.google/blog/t5gemma-a-new-collection-of-encoder-decoder-gemma-models/

Summary

Google released T5Gemma — encoder-decoder models adapted from Gemma 2 decoder-only weights via continued pretraining with UL2/PrefixLM objectives. T5Gemma 9B-9B outperforms Gemma 2 9B by 9+ points on GSM8K and 4 on DROP; T5Gemma 2B instruction-tuned gains 12 points MMLU vs. Gemma 2 2B, lifting GSM8K from 58.0% to 70.7%. Available in T5-sized variants plus 2B and 9B Gemma 2 -based versions.

Implications

Encoder-decoder from decoder-only is the methodological contribution. Demonstrating that pretrained decoder weights can be adapted into encoder-decoder architectures without full retraining is a practical finding for the open-source ML community — it means Gemma 2 finetuners can get encoder-decoder capabilities without starting from scratch.

9-point GSM8K gain from architectural change alone is significant. If the only change is adapting the architecture (not more data or larger model), a 9-point math reasoning improvement from the encoder-decoder conversion is a strong result. It suggests the encoder-decoder architecture has advantages for structured reasoning tasks that pure autoregressive decoding misses.

SuperGLUE frontier for classification/extraction. Encoder-decoder models traditionally dominate classification and structured prediction tasks where the input context needs to be processed bidirectionally. T5Gemma filling this niche means Gemma’s open-weights ecosystem now covers both autoregressive generation (decoder) and discriminative tasks (encoder-decoder) — a more complete alternative to commercial APIs.

Watch:

Community adoption for NER, classification, and extraction tasks — the encoder-decoder fit for these is natural
Whether T5Gemma 9B-9B becomes a popular fine-tuning base for structured prediction across Kaggle and research use cases
Potential integration as the backbone for classification layers in Gemma-based production pipelines

← all signals