2024-07-23 · HuggingFace

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

models

Llama 3.1 - 405B, 70B & 8B with multilinguality and long context

Source: HuggingFace Date: 2024-07-23 URL: https://huggingface.co/blog/llama31

Summary

Model release: Meta Llama 3.1 in 8B, 70B, and 405B — the first Llama family with 128K context, 8-language multilingual support (English, German, French, Italian, Portuguese, Hindi, Spanish, Thai), and tool calling. 405B positioned explicitly for synthetic data generation and LLM-as-judge workloads. Updated license permits using model outputs to improve other LLMs. Benchmarks: Llama 3.1 405B MMLU 85.2, BIG-Bench Hard 85.9, TriviaQA 91.8. Memory: 405B requires 810GB VRAM at FP16 / 405GB at FP8. KV cache at 128K for 405B: 123GB alone.

Implications

Model release cadence. The 405B explicitly designed for synthetic data generation — Meta’s license permitting use of outputs to improve other LLMs is a deliberate inversion of the typical commercial model clause. This positions Llama 3.1 405B as infrastructure for the open-weights training ecosystem itself, not just an end-user model.

Open-weights ecosystem health. The 128K context and multilingual support close two major gaps vs GPT-4 Turbo at the time of release. The hardware requirements (810GB FP16 for 405B) still require multi-node GPU clusters, but FP8 at 405GB is manageable on an 8xH100 node — meaning most well-resourced labs can now run frontier-class open-weights inference without cloud dependency.

← all signals