Welcome Falcon Mamba: The first strong attention-free 7B model
read at source ↗ huggingface.co
Welcome Falcon Mamba: The first strong attention-free 7B model
Source: HuggingFace Date: 2024-08-12 URL: https://huggingface.co/blog/falconmamba
Summary
Model release from TII: Falcon Mamba 7B, the first attention-free 7B model competitive with Transformer-based peers. Pure Mamba (SSM) architecture enables constant memory and constant generation time regardless of sequence length — no KV cache growth. Benchmarks on HF Leaderboard v2: 15.04 average vs Gemma-7B (15.28) and Mistral-7B (14.50). Fits on a single A10 24GB GPU. Available in base, instruction-tuned, and 4-bit quantized variants.
Implications
Thread: open-weights ecosystem health / model release cadence. Falcon Mamba demonstrates that SSM models are no longer a research novelty — a 7B SSM is now genuinely competitive with 7B Transformers on standard benchmarks. The constant-memory property matters for two scenarios: extremely long context inference (no quadratic attention scaling) and edge deployment (predictable VRAM regardless of context). Whether SSMs displace Transformers in production depends on whether they can match attention on long-document reasoning tasks, not just perplexity benchmarks. Watch TII’s roadmap for larger Falcon Mamba variants.