granite multilingual r2: compact 97m model tops sub-100m retrieval

source: hugging face blog: granite embedding multilingual r2: open apache 2.0 multilingual embeddings with 32k context — best sub-100m retrieval quality

level: technical

ibm released granite embedding multilingual r2, two new embedding models under apache 2.0 license. the compact 97m-parameter model scores 60.3 on mteb multilingual retrieval, beating all open sub-100m multilingual embedders. the full-size 311m model scores 65.2, ranking second among open models under 500m parameters. both support over 200 languages, with enhanced retrieval for 52 languages and nine programming languages. they handle 32,768-token contexts, a 64x increase over the previous r1 models, and include matryoshka embeddings for the 311m version.

the models are built on modernbert, replacing the older xlm-roberta encoder. this brings rotary position embeddings for long contexts, alternating attention for speed, and flash attention 2.0 support. the 97m model uses a pruned 180k-token vocabulary, while the 311m uses a 262k-token one. training involved knowledge distillation from multiple teacher models, contrastive fine-tuning on multilingual retrieval pairs, and model merging. the 97m model was created by pruning the vocabulary and then distilling knowledge from larger teachers, keeping language coverage while cutting size.

benchmarks show the 97m model outperforms larger models like multilingual-e5-base and gte-multilingual-base. it beats the widely used paraphrase-multilingual-minilm-l12-v2 by 23.7 points on multilingual retrieval. the 311m model leads on longembed with 71.7, thanks to the 32k context. code retrieval improved by 19.7 points for the 97m model. the 311m model's matryoshka embeddings allow dimension reduction with minimal quality loss: dropping to 256 dimensions loses only 0.5 points on multilingual retrieval. both models are compatible with sentence-transformers, langchain, and other frameworks, and include onnx and openvino weights for cpu inference.

why it matters: smaller, faster multilingual embeddings with strong retrieval quality reduce compute costs and enable broader language support in ai applications.

source: hugging face blog: granite embedding multilingual r2: open apache 2.0 multilingual embeddings with 32k context — best sub-100m retrieval quality