2024-09-25 · HuggingFace

Llama can now see and run on your device - welcome Llama 3.2

agentsmodels

Llama can now see and run on your device - welcome Llama 3.2

Source: HuggingFace Date: 2024-09-25 URL: https://huggingface.co/blog/llama32

Summary

Model release: Llama 3.2 from Meta — 10 models across two tiers. Multimodal vision (11B and 90B): trained on 6B image-text pairs, 128k context, MMMU 50.7%/60.3% (11B/90B with CoT), DocVQA 88.4%/90.1%. Small on-device text models (1B and 3B): 9T tokens, 128k context, 8 languages. 3B Instruct matches 8B on IFEval (77.01 vs. 76.49) — specifically noted as suitable for agentic applications. Memory requirements: 1B at 0.75-2.5GB, 3B at 1.75-6.5GB. EU licensing restriction on the vision models.

Implications

Open-weights ecosystem health. Llama 3.2 is the model release that brought competitive vision capability to the open-weights tier while simultaneously pushing sub-3B text models to 8B-competitive instruction following. The 1B/3B memory footprint (fitting entirely in modest RAM) made on-device deployment with quantization genuinely practical for mobile and edge use cases.

Model release cadence. Meta’s pattern with Llama 3.2 — multimodal + small-on-device in the same release — set an expectation for simultaneous cloud and edge coverage that subsequent releases from other labs have had to match. The EU vision model licensing restriction was a notable early signal of the regulatory friction that open-weights releases would face.

← all signals