2025-03-12 · HuggingFace

Welcome Gemma 3: Google's all new multimodal, multilingual, long context open LLM

modelsresearch

read at source ↗ huggingface.co

Welcome Gemma 3: Google’s all new multimodal, multilingual, long context open LLM

Source: HuggingFace Date: 2025-03-12 URL: https://huggingface.co/blog/gemma3

Summary

Model release: Google Gemma 3 in 1B (text-only), 4B, 12B, and 27B (multimodal). Context: 32k for 1B, 128k for 4B/12B/27B. Supports 140+ languages. Architecture: SigLIP image encoder with “pan and scan” for high-resolution images; adjusted RoPE base frequency (10k→1M) for long context; sliding window attention. Benchmarks (27B IT): LMSys Elo 1339 (comparable to o1-preview), MMLU-Pro 67.5, MATH 69.0, GPQA Diamond 42.4. Google claims Gemma-3-4B-IT beats Gemma-2-27B-IT and Gemma-3-27B-IT beats Gemini 1.5-Pro.

Implications

Model release cadence. Gemma-3-4B beating the prior-generation 27B on benchmarks — if the claim holds under independent evaluation — confirms the compression trend across model generations. The multimodal capability now shipping at 4B makes vision-language pipelines deployable on a single consumer GPU for the first time in this family.

Open-weights ecosystem health. Day-zero Transformers support plus Ollama/llama.cpp/MLX availability means Gemma 3 enters the full deployment ecosystem on launch day. The 128k context window at 4B/12B sizes is particularly significant — 128k at a deployable model size enables long-document tasks that previously required 70B+ models or commercial APIs.

← all signals