2025-06-26 · HuggingFace

Gemma 3n fully available in the open-source ecosystem!

modelsresearchinfrastructure

Gemma 3n fully available in the open-source ecosystem!

Source: HuggingFace Date: 2025-06-26 URL: https://huggingface.co/blog/gemma3n

Summary

Model release: Gemma 3n full open-source ecosystem integration. Two sizes — E2B (5B actual / 2B effective, 2GB VRAM) and E4B (8B actual / 4B effective, 3GB VRAM). Architecture innovations: MatFormer (nested transformer enabling layer extraction), Per-Layer Embeddings (offloads embeddings to CPU), KV Cache Sharing (2x faster prefill), MobileNet-v5-300 vision encoder (60 FPS on Pixel), USM-based audio encoder. E4B first sub-10B model at 1300+ LMArena score. Supports text, images, audio, video across 35+ languages. Full day-0 support in Transformers, MLX, llama.cpp, Transformers.js, Ollama.

Implications

Thread: open-weights ecosystem health / model release cadence. Gemma 3n’s “effective” parameter framing (5B actual → 2B effective memory) is the key innovation: Google is making multimodal model quality accessible at memory footprints previously reserved for text-only models. The 2-3GB VRAM requirement puts full multimodal capability on mobile and low-end GPU hardware. The MatFormer nested architecture (E2B as sub-model of E4B) is a flexible deployment primitive — single training run, multiple deployment targets. Day-0 support across MLX, llama.cpp, and Ollama signals this was coordinated with the ecosystem ahead of release.

← all signals