2026-04-02 · Google

Gemma 4: Byte for byte, the most capable open models

modelsinfrastructure

Gemma 4: Byte for byte, the most capable open models

Source: DeepMind Date: 2026-04-02 URL: https://deepmind.google/blog/gemma-4-byte-for-byte-the-most-capable-open-models/

Summary

Google released Gemma 4 in four variants: 2B/4B (edge), 26B Mixture of Experts (#6 Arena AI open-source), and 31B Dense (#3 Arena AI open-source, claimed to outperform models 20x its size). Capabilities: advanced multi-step reasoning, native function calling, structured JSON output, multimodal (video/image/audio for smaller models), 128K–256K context, 140+ languages. Apache 2.0 license. Available on HuggingFace, NVIDIA NIM, Ollama, and Google Cloud. Prior Gemma generations: 400M+ downloads.

Implications

#3 on Arena AI open-source leaderboard at 31B is the competitive benchmark. 400M prior downloads established Gemma as the default Google open-weight base model. Gemma 4 31B claiming #3 on the industry-standard open-source leaderboard (above Llama 4 and Mistral’s entries at comparable sizes) positions it as the efficiency leader in the sub-70B weight class.

MoE at 26B for an open model is a structural choice. Mixture of Experts enables larger effective model capacity at smaller active parameter counts — the same architecture choice Meta used for Llama 4’s MoE variants. Gemma 4 26B MoE at #6 suggests Google is shipping MoE methodology into the open-source ecosystem, not keeping it proprietary.

400M downloads as a baseline means Gemma 4 adoption will be fast. The existing Gemma ecosystem (fine-tuners, integrators, platforms) will upgrade rather than starting fresh. Apache 2.0 + HuggingFace + NVIDIA NIM + Ollama is comprehensive commercial deployment coverage — this reaches every part of the production open-model market.

Watch:

Whether the “outperforms models 20x its size” claim holds on third-party evals vs. Llama 4 405B and similar
NVIDIA NIM integration adoption — that’s the enterprise GPU inference channel
Gemma 4 multimodal variants for edge devices — video+audio+image on 2B/4B is a significant capability for IoT and mobile use cases

← all signals