Introducing the Ettin Reranker Family
infrastructure
read at source ↗ huggingface.co
Introducing the Ettin Reranker Family
Source: HuggingFace Date: 2026-05-19 URL: https://huggingface.co/blog/ettin-reranker
Summary
Hugging Face released six open CrossEncoder reranker models under the cross-encoder/ettin-reranker-*-v1 family, ranging from 17M to 1B parameters, all built on ModernBERT with 8K token context and Apache 2.0 licensing. The 32M model outperforms the 568M BAAI/bge-reranker-v2-m3 on MTEB retrieval benchmarks; the 1B model matches a 1.54B teacher model at 2.4× its throughput. Training used distillation from mxbai-rerank-large-v2 on 143M query-document triples, with full reproducible recipe published.
Implications
- Open-weight ecosystem. The Ettin family establishes new Pareto-efficient points on the reranker size/quality/speed curve. A 17M model running at 7,500 pairs/sec that beats the canonical 33M MiniLM baseline is a direct drop-in replacement for high-throughput production search — this will get adopted quickly.
- Agent-layer convergence. Retrieve-then-rerank is the dominant pattern for grounding agents in large document corpora. Cheaper, faster, more accurate rerankers reduce latency and cost for every RAG pipeline running in production. The 8K context support is specifically useful for agents that need to rank longer documents or multi-turn conversation chunks.
- Token economics. Running reranking in-house at 17M–150M parameters is now strictly cheaper and more accurate than relying on hosted embedding APIs for the rerank step. Teams that have not yet separated retrieval from reranking should revisit the architecture — the quality gap has closed enough to justify the engineering investment.