✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

Paper TLDR: How we trained zerank-1 with the zELO method

Sep 19, 2025

After popular demand, this is an executive TLDR of the paper zELO: ELO-inspired Training Method for Rerankers and Embedding Models.

Try zerank-1 and zerank-1-small in this 5-line Google Colab.

1. What is a reranker?

A reranker is a cross-encoder that takes a query and candidate documents and reorders them for accuracy. It’s the step that makes RAG actually useful: the reranker decides which 5-10 documents an LLM sees.

2. Why not triplet loss + human annotations?

Traditional rerankers are trained on queries with mined positive and negative results. The goal is to find negatives that look relevant but aren’t, so the model learns fine-grained distinctions. But as mining improves, many of those “negatives” are actually relevant (=false negatives) which confuses the model and degrades performance.

3. Pairwise with LLMs

Instead of human triplets, we use LLMs to compare document pairs. Pairwise comparisons are more robust, scale cheaply, and align well with human intuition.

4. From pairwise to pointwise: Elo

We model outcomes with the Bradley–Terry / Elo system: documents “battle,” scores accumulate, and we get calibrated continuous relevance values.

5. Tackling scale

Naively, for 1 query, and k=100 candidates, that’s 10k pairwise inferences per query. Even our small fine-tuned pairwise model is too costly to inference at this scale.

6. Sparse sampling

We solved this with random cycles sampling: O(n) comparisons instead of O(n²). Only ~400 pairs per query are needed, instead of 10k, with little accuracy loss.

7. Cross-query calibration

Elo is relative within one query. We estimate and subtract cross-query biases to align scores across corpora, so the reranker generalizes everywhere, across verticals and tasks.

8. Training setup

We LoRA fine-tuned Qwen-4B and Qwen-1.7B on queries from healthcare, finance, legal, manufacturing, STEM, and code. Ablation studies show that mixing diverse domains yields the strongest performance within each vertical.

9. Performance (See full benchmarks here)

• Outperforms BM25, OpenAI embeddings, and hybrid search by >+15% NDCG@10

• Outperforms all other API-based rerankers by >+5% NDCG@10 on every domain

10. Availability

• Accessible via API + AWS Marketplace

• Open weights on HuggingFace

• Latency: 129ms p50 for 75kB payloads (fastest API-based reranker we’re aware of).

• Cost: $0.025 / 1M tokens (cheapest API-based reranker we’re aware of)

Don’t miss out on future research:

Read more: Benchmarks, Paper, Technical Blog Post

Join our Discord: ZeroEntropy Discord, or Context Engineers General Discord

Get in touch: Email, LinkedIn

Join our Slack: ZeroEntropy Slack

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

Paper TLDR: How we trained zerank-1 with the zELO method

SHARE

Don’t miss out on future research:

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking