2025-03-26 · HuggingFace

Training and Finetuning Reranker Models with Sentence Transformers

ecosystem

Training and Finetuning Reranker Models with Sentence Transformers

Source: HuggingFace Date: 2025-03-26 URL: https://huggingface.co/blog/train-reranker

Summary

Tutorial for finetuning cross-encoder reranker models with Sentence Transformers, demonstrating that domain-specific finetuning on small datasets outperforms large general models. A 396M parameter finetuned ModernBERT reranker achieves NDCG@10 of 79.42 on GooAQ, beating mixedbread’s 1.54B mxbai-rerank-large-v2 (75.40). Even the 150M base finetuned model (77.14) beats all sub-1B general rerankers. Training took ~30 minutes on 99k samples.

Implications

Thread: transformers library trajectory / open-weights ecosystem health. This is the reranker companion to the embedding model training guide — together they complete the Sentence Transformers training story for retrieval pipelines. The 396M finetuned > 1.54B general finding is consistent with the multimodal embedding training post (finetuned 2B > base 8B): domain finetuning is consistently beating scale for retrieval tasks. CrossEncoderTrainer becoming a first-class citizen in Sentence Transformers means reranker training is now as accessible as embedding training, lowering the bar for production RAG pipelines significantly.

← all signals