2025-11-21 · HuggingFace

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

modelscapitalresearchinfrastructure

Open ASR Leaderboard: Trends and Insights with New Multilingual & Long-Form Tracks

Source: HuggingFace Date: 2025-11-21 URL: https://huggingface.co/blog/open-asr-leaderboard

Summary

Research summary and leaderboard update: Open ASR Leaderboard expands with multilingual (5 languages) and long-form transcription tracks across 60+ models from 18 organizations on 11 datasets. Key findings: Conformer encoder + LLM decoder architectures (NVIDIA Canary-Qwen-2.5B, IBM Granite-Speech-3.3-8B) achieve lowest WER; CTC/TDT decoders are 10–100x faster with minimal accuracy loss (Parakeet CTC 1.1B: RTFx 2793 vs Whisper Large v3: RTFx 68, WER 6.68 vs 6.43). Long-form: closed-source systems lead; Whisper Large v3 is best open model. Multilingual: Whisper Large v3 remains strong baseline across 99 languages.

Implications

Open-weights ecosystem health. The 10–100x throughput gap between CTC/TDT and LLM-decoder ASR at minimal WER cost (0.25 absolute) is a practical signal for production deployments: Parakeet CTC is effectively free in compute terms for English speech tasks while achieving competitive accuracy. The LLM-decoder architectures dominate accuracy leaderboards but are impractical for real-time or high-volume use without dedicated GPU capacity.

HF as open-source ML hub. The Open ASR Leaderboard expanding to multilingual and long-form tracks with a published preprint reinforces HF’s role as the venue for community-maintained evaluation infrastructure. 60+ models from 18 organizations being evaluated on a common benchmark prevents the fragmentation of speech model evaluation across incomparable custom benchmarks.

← all signals