Paper TLDR: How we trained zerank-1 with the zELO method

Sep 19, 2025

SHARE

After popular demand, this is an executive TLDR of the paper zELO: ELO-inspired Training Method for Rerankers and Embedding Models

1. What is a reranker?

A reranker is a cross-encoder that takes a query and candidate documents and reorders them for accuracy. It’s the step that makes RAG actually useful: the reranker decides which 5-10 documents an LLM sees.

2. Why not triplet loss + human annotations?

Traditional rerankers are trained on queries with mined positive and negative results. The goal is to find negatives that look relevant but aren’t, so the model learns fine-grained distinctions. But as mining improves, many of those “negatives” are actually relevant (=false negatives) which confuses the model and degrades performance.

3. Pairwise with LLMs

Instead of human triplets, we use LLMs to compare document pairs. Pairwise comparisons are more robust, scale cheaply, and align well with human intuition.

4. From pairwise to pointwise: Elo

We model outcomes with the Bradley–Terry / Elo system: documents “battle,” scores accumulate, and we get calibrated continuous relevance values.

5. Tackling scale

Naively, for 1 query, and k=100 candidates, that’s 10k pairwise inferences per query. Even our small fine-tuned pairwise model is too costly to inference at this scale.

6. Sparse sampling

We solved this with random cycles sampling: O(n) comparisons instead of O(n²). Only ~400 pairs per query are needed, instead of 10k, with little accuracy loss.

7. Cross-query calibration

Elo is relative within one query. We estimate and subtract cross-query biases to align scores across corpora, so the reranker generalizes everywhere, across verticals and tasks.

8. Training setup

We LoRA fine-tuned Qwen-4B and Qwen-1.7B on queries from healthcare, finance, legal, manufacturing, STEM, and code. Ablation studies show that mixing diverse domains yields the strongest performance within each vertical.

9. Performance (See full benchmarks here)

• Outperforms BM25, OpenAI embeddings, and hybrid search by >+15% NDCG@10

• Outperforms all other API-based rerankers by >+5% NDCG@10 on every domain

10. Availability

• Accessible via API + AWS Marketplace

• Open weights on HuggingFace

• Latency: 129ms p50 for 75kB payloads (fastest API-based reranker we’re aware of).

• Cost: $0.025 / 1M tokens (cheapest API-based reranker we’re aware of)

Don’t miss out on future research:

Read more: Benchmarks, Paper, Technical Blog Post

Join our Discord: ZeroEntropy Discord, or Context Engineers General Discord

Get in touch: Email, LinkedIn

Join our Slack: ZeroEntropy Slack

Get started with

ZeroEntropy Animation Gif
ZeroEntropy Animation Gif

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

RELATED ARTICLES
Abstract image of a dark background with blurry teal, blue, and pink gradients.