Welcome Llama 4 Maverick & Scout on Hugging Face
read at source ↗ huggingface.co
Welcome Llama 4 Maverick & Scout on Hugging Face
Source: HuggingFace Date: 2025-04-05 URL: https://huggingface.co/blog/llama4-release
Summary
Model release: Meta’s Llama 4 Maverick (~400B params, 17B active, 128 experts) and Llama 4 Scout (~109B params, 17B active, 16 experts) arrive on HF. Both are natively multimodal MoE models trained on 40T tokens across 200 languages. Scout supports 10M token context; Maverick supports 1M. Key benchmarks vs Llama 3.1 405B: Maverick GPQA Diamond 69.8% (vs 49.0%), MMLU Pro 80.5% (vs 73.4%), LiveCodeBench 43.4% (vs 27.7%). Available via Transformers v4.51.0+, TGI, with int4/FP8 quantization.
Implications
Model release cadence. Llama 4 is a significant architecture shift — MoE with iRoPE (interleaved no-positional-encoding layers) and chunked attention. Scout’s 10M context window and Maverick’s 1M are production-relevant numbers, not theoretical maximums. The active parameter count (17B) means inference cost is closer to a mid-size dense model despite the parameter total.
Open-weights ecosystem health. Two fully open-weights models that beat Llama 3.1 405B across reasoning, coding, and knowledge benchmarks while being cheaper to serve is the clearest signal yet that MoE has become the dominant architecture for open-weights frontier models. Xet storage backend enabling ~25% deduplication across the model family is a quiet infrastructure improvement that matters for Hub storage costs at scale.