new ettin reranker family hits state-of-the-art at every size

source: hugging face blog: introducing the ettin reranker family

level: technical

six new sentence transformers cross-encoder rerankers are now available, ranging from 17 million to 1 billion parameters. they are built on the ettin modernbert encoders and trained with a distillation recipe using scores from mixedbread-ai/mxbai-rerank-large-v2. the models use a pointwise approach, taking a query-document pair and outputting a single relevance score. they support up to 8192 tokens of context and can be used with just a few lines of code.

on the mteb retrieval benchmark, the 1b model nearly matches its 1.54b teacher, scoring 0.6114 ndcg@10 versus 0.6115. the 150m model beats qwen3-reranker-0.6b by 0.005, and the 17m model outperforms the 33m ms-marco-minilm-l12-v2 by 0.051. all models were evaluated by pairing them with six different embedding models and reranking the top 100 candidates. the smallest models offer strong quality for their size, making them practical drop-in replacements for older rerankers.

the architecture uses a cls pooling head with four dense layers, which outperformed mean pooling in tests. speed improvements come from flash attention 2 and sequence unpadding, yielding 1.7x to 8.3x faster inference over default loading. the training data is a subset of lightonai/embeddings-pre-training mixed with a reranked subset of lightonai/embeddings-fine-tuning. the full training recipe and data are released, allowing others to reproduce or adapt the models.

why it matters: these rerankers improve retrieval quality at lower cost, enabling faster and more accurate search pipelines for ai applications.

source: hugging face blog: introducing the ettin reranker family