✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

Nov 18, 2025

Today, we're releasing zerank-2: the world's best reranker, purpose-built to address some of the most important problems in information retrieval.

Rerankers are a crucial part of what makes search and RAG pipelines actually work in production — yet, even industry-standard rerankers (including Cohere's rerank-3.5) fail at capturing nuanced relevance for real-world queries.
zerank-2 outperforms every other reranker on both accuracy and latency — at half the price.
It excels at multilingual and cross-lingual data, follows user (and agent) instructions precisely, and is robust to those complex aggregation queries common in enterprise AI workflows.
Additionally, we've normalized our relevance scores and added a confidence statistic, allowing for more consistent interpretation of reranker output across all kinds of queries and domains.

Real Stories of How Rerankers Break in Production

From early stage startups to Fortune 50 AI teams, we kept hearing similar production failure modes:

"It works okay in English, but performance tanks for multilingual queries"
"We need to set a relevance threshold, but don't know how to interpret scores consistently."
"It can't understand our domain terminology or specific use case without slow LLM query rewrites"
"Prompting it with instructions breaks it entirely"
"Documents that would provide helpful context to our agent get scored too low, just because they don't directly answer the question."

Many rerankers overfit on public benchmarks, yet don’t generalize to these real production issues.

Introducing zerank-2

Today, we're releasing zerank-2, a state-of-the-art cross-encoder reranker built specifically to solve some of the most common production failures.

zerank-2 was trained with our new zELO training pipeline, which converts pairwise preferences into absolute Elo scores. You can read more about our methodology here.

The model is already available behind our API, and on HuggingFace.

What zerank-2 solves

1. Native Instruction-Following

Providing instructions to a reranker model can significantly boost accuracy results in most situations.

With zerank-2, you can now append specific instructions, lists of abbreviations, business context, or user-specific memories to influence how results get reranked.

<query> "Candidates with IMO experience" </query>
<instruction> We're looking for engineering talent for a marine logistics company. </instructions>

Document: "Candidate experience: Worked at the International Marine Organization"
ZeroEntropy zerank-2: 0.3304
ZeroEntropy zerank-2 with instructions: 0.6421

You can also see how depending on the business context passed into zerank-2, it correctly disambiguates polysemic queries — it discerns that achievement in the IMO (International Math Olympiad) is relevant for hiring at an AI startup, but that working at the IMO (International Maritime Organization) is more so for working at a maritime logistics company.

Example IF Query	Context: Fast-growing AI startup	Context: Maritime logistics company
Candidate: IMO Gold Medalist	0.54	0.46
Candidate: Worked with the IMO on compliance.	0.46	0.60

zerank-2 also knows to give appropriate scores to documents not directly answering the user's query, but providing useful context which ensures diversity and quality of the response.

Query: "What should I cook for dinner tonight?"

Document 1: "I'm allergic to nuts so I'd rather not eat that"
Document 2: "I'm Alex"

❌ Cohere rerank-3.5:
  1. 0.0419 - I'm Alex
  2. 0.0434 - I'm allergic to nuts so I'd rather not eat that

❌ Voyage rerank-2:
  1. 0.5859 - I'm Alex
  2. 0.5312 - I'm allergic to nuts so I'd rather not eat that

❌ OpenAI text-embedding-3-large:
  1. 0.0840 - I'm Alex
  2. 0.1671 - I'm allergic to nuts so I'd rather not eat that

✅ ZeroEntropy zerank-2:
  1. 0.2994 - I'm allergic to nuts so I'd rather not eat that
  2. 0.1645 - I'm Alex
    
Nice to meet you, Alex. Sorry about your anaphylaxis. 🥜⚰️

2. True Multilingual Parity

Most rerankers exhibit a strong modality gap between languages, ranking the very same document, translated into another language, lower than its English equivalent. That gap widens even more on non-English to non-English tasks.

We trained zerank-2 to be robust to multilinguality across 100+ languages with near-English performance across major languages, even on challenging scripts (Chinese, Arabic), and code-switching queries (Spanglish, Hinglish).

3. Score Bias Adjustment and New Confidence Score

Most rerankers' scores are "relatively" correct yet nonetheless "absolutely" meaningless: a score of 0.7 might indicate 90% relevance in one case, while 0.7 from another, might mean 30%.

Worse, some rerankers, like Voyage rerank-3.5, always output scores around 0.5, regardless of the true relevancy of the document. This makes setting a threshold your agent or workflow can trust to filter low-quality results almost impossible.

We fixed it.

Through careful calibration across query types and domains, every time zerank-2 scores 0.8, it actually means ~80% relevance, consistently, and predictably.

Also, it now even outputs a new Confidence Score, to give a measure of its own confidence.

The graphs above show this in action. Voyage and Cohere scores scattered across the space, making any single threshold fail. zerank-2 correlates linearly with ground truth scores much more robustly.

4. SQL-Style Queries and Aggregation Queries

It was surprising to discover just how many unstructured queries from our clients actually resemble structured SQL. Yet rerankers are not only not robust to these at all, they often degrade performance against just doing nothing.

Even a mere "ORDER BY" on quantitative values confuses every reranker, returning "ordered" results often worse than first-pass retrieval embedding models:

Query: "Which reranker is the fastest?"
Doc 1: Jina's reranker: rerank-m0 • 300 ms latency
Doc 2: Cohere's reranker: rerank-3.5 • 120 ms latency
Doc 3: ZeroEntropy's reranker: zerank-2 • 60 ms latency

❌ Cohere rerank 3.5 prefers to put itself first 
✅ ZeroEntropy ranks these perfectly

Get Started

Available now via the ZeroEntropy API:

from zeroentropy import ZeroEntropy

ze = ZeroEntropy(api_key="your_api_key")

# Pointwise reranking
results = ze.models.rerank(
    model="zerank-2"
    query="your query",
    documents=["document a", "document b"],
)

Drop-in replacement for zerank-1 or any existing reranker you run in prod, with 1 line of code change.

Documentation: zerank-2 docs

Pricing: $0.025/1M tokens, which is 50% cheaper than all other commercial rerankers.

Get in touch: Discord community or contact@zeroentropy.dev

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

Introducing zerank-2: The Most Accurate Multilingual Instruction-Following Reranker

SHARE

Real Stories of How Rerankers Break in Production

Introducing zerank-2

What zerank-2 solves

1. Native Instruction-Following

2. True Multilingual Parity

3. Score Bias Adjustment and New Confidence Score

4. SQL-Style Queries and Aggregation Queries

Get Started

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking