Welcome to the Falcon 3 Family of Open Models!
read at source ↗ huggingface.co
Welcome to the Falcon 3 Family of Open Models!
Source: HuggingFace Date: 2024-12-17 URL: https://huggingface.co/blog/falcon3
Summary
Model release: Technology Innovation Institute’s Falcon 3 family — 1B, 3B, 7B, 10B, and Mamba-7B base models, all under 10B parameters, Llama-architecture compatible. The 7B trained on 14T tokens across 1024 H100s; 10B produced via depth upscaling from 7B plus 2T additional tokens; 1B/3B via knowledge distillation. Benchmarks: Falcon3-10B MMLU 73.1, GSM8K 83.0, MBPP 73.8; 3B claims to outperform Llama-3.1-8B. Falcon3-Mamba-7B claims best-in-class SSM performance.
Implications
Open-weights ecosystem health. Depth upscaling (duplicating layers from a smaller trained model then continuing training) is a compute-efficient path to larger models — TII getting a 10B model from a 7B base at fraction of the training cost is a methodology worth tracking. The 3B-outperforms-8B claim, if it holds on independent evaluation, continues the compression trend where newer small models exceed older large ones.
Model release cadence (regional models). TII is an Abu Dhabi research institute; Falcon has been a consistent open-weights release cadence from outside the US/UK/China cluster. The Mamba variant and GGUF/GPTQ/AWQ quantization coverage on day one reflects matured release practices — the ecosystem infrastructure for open-weights model distribution has standardized around these formats.