2026-04-02 · HuggingFace

Welcome Gemma 4: Frontier multimodal intelligence on device

models

Welcome Gemma 4: Frontier multimodal intelligence on device

Source: HuggingFace Date: 2026-04-02 URL: https://huggingface.co/blog/gemma4

Summary

Model release: Google DeepMind’s Gemma 4 family — four sizes (E2B 2.3B, E4B 4.5B, 26B MoE with 4B active, 31B dense) under Apache 2.0. Multimodal inputs: images, text, audio, video. Key benchmarks: 31B scores 89.2% on AIME 2026, 84.3% on GPQA Diamond, 80.0% on LiveCodeBench v6 (Codeforces ELO 2150). E2B and E4B support audio. Context windows: 128k (small) and 256k (large). Native JSON output without grammar constraints, multimodal thinking, and function calling. Day-0 support across transformers, llama.cpp, MLX, transformers.js, and mistral.rs.

Implications

Open-weights ecosystem health. Gemma 4 31B at 89% AIME 2026 is a landmark: an open-weights model approaching frontier math reasoning previously only claimed by closed systems. The Apache 2.0 license with commercial use raises the stakes for every non-permissive open-weights competitor. The MoE 26B running with only 4B active parameters is a strong local-deployment efficiency story.

Model release cadence. Day-0 deployment support across five inference runtimes (including browser/WebGPU via transformers.js and Apple Silicon via MLX) has become the expected baseline for a Google Gemma launch. The full-stack shipping pattern (model + fine-tuning recipes + inference backends + HuggingChat) is now a template other vendors must match.

HF as open-source ML hub. Gemma 4 reinforces HF’s role as the distribution layer for Google’s open releases — weights, tutorials, and integration examples all land on HF first, making HF the effective homepage for the open-weights Gemma ecosystem.

← all signals