2025-10-01 · HuggingFace

Introducing RTEB: A New Standard for Retrieval Evaluation

protocolsmodelsresearch

Introducing RTEB: A New Standard for Retrieval Evaluation

Source: HuggingFace Date: 2025-10-01 URL: https://huggingface.co/blog/rteb

Summary

Benchmark release: RTEB (Retrieval Embedding Benchmark), a beta evaluation standard for retrieval embedding models from the MTEB maintainers. Covers 20 languages and four domains (legal, healthcare, finance, code), evaluated with NDCG@10. Key design: hybrid open/private dataset split — public datasets for transparency, private datasets held by MTEB maintainers to detect overfitting. ~20 open datasets, ~13 private. No baseline scores published in the announcement; leaderboard is live on HF Spaces.

Implications

Open-weights ecosystem health. The private dataset component is a direct response to the teaching-to-the-test problem that has made MTEB scores increasingly unreliable as labs train against the public eval set. A benchmark where generalization is measured against data the model has never seen is the right design — the question is whether maintainers can keep the private split secure at scale as the leaderboard grows.

HF as open-source ML hub. RTEB lives on HF Spaces and is maintained by the MTEB team (based at HF). As retrieval quality becomes the primary differentiation for embedding models used in RAG pipelines, HF hosting the authoritative benchmark makes it the institutional home for this evaluation category — a meaningful position in the enterprise AI stack.

← all signals