2025-04-29 · HuggingFace

Welcoming Llama Guard 4 on Hugging Face Hub

securitymodelsinfrastructure

Welcoming Llama Guard 4 on Hugging Face Hub

Source: HuggingFace Date: 2025-04-29 URL: https://huggingface.co/blog/llama-guard-4

Summary

Model release: Llama Guard 4, a 12B dense multimodal safety classifier pruned from Llama 4 Scout (removing MoE routing experts, retaining shared expert). Runs on single 24GB GPU. Covers 14 hazard types + code interpreter abuse; supports text-only and image+text (up to 5 images), multilingual. Multi-image recall improves +20% over Guard 3. Also releases Llama Prompt Guard 2 (86M and 22M) for prompt injection/jailbreak detection.

Implications

Thread: open-weights ecosystem health / model release cadence. Meta releasing safety models as open weights is strategically important: it gives smaller companies a production-grade content moderation stack without building from scratch or paying API fees. The pruning technique (removing MoE routing, keeping shared expert) is an interesting efficiency approach that could generalize — watch whether other labs use it to extract dense specialists from MoE models. Llama Prompt Guard 2 at 22M parameters for injection detection is small enough to run as a pre-filter on every request without meaningful latency overhead.

← all signals