Lightning-Fast Reranking with zerank-1

Jul 22, 2025

Speed is the secret ingredient that makes great AI feel instant

What is a reranker and why you need one

A reranker is a cross-encoder neural model that takes a short list of candidate documents from a fast first-stage search (BM25, vector search or hybrid) and rescoring them with full query–document context. This second-pass step dramatically boosts precision in your top-k results, ensuring your LLM or user sees the most relevant snippets first.

Benchmark results

Model	NDCG@10	Latency (12 KB)	Latency (150 KB)
Jina rerank m0	0.7279	1 381.5 ms ± 2 082.2	4 543.8 ms ± 2 984.9
Cohere rerank 3.5	0.7091	171.5 ms ± 106.8	459.2 ms ± 87.9
ZeroEntropy zerank-1	0.7683	149.7 ms ± 53.1	314.4 ms ± 94.6

zerank-1 is:

~12 % faster than Cohere 3.5 on small payloads (149.7 ms vs 171.5 ms)
~31 % faster on large payloads (314.4 ms vs 459.2 ms)
9× faster than Jina on 12 KB queries and 14× faster on 150 KB queries

All while delivering the highest NDCG@10 of the group.

Why speed matters

Whether you’re powering an enterprise search portal or a conversational voice agent, every millisecond counts. Here are some examples why:

RAG apps: Users expect sub-second results. Slow reranking means cold leads and frustrated employees.
Voice AI agents: Jitter in your pipeline breaks the illusion of a human-like dialogue. Quick reranking keeps the conversation flowing.
E-commerce search bars: Users only go through the top ~10 results which need to be very accurate, but every wasted millisecond can make them churn.

When to use a reranker

Tight LLM contexts: Surface the few most relevant documents so your prompt stays under token limits.
Precision-critical workflows: Legal search, medical Q&A or compliance use cases where every bit of relevance matters.
Cost-sensitive scale: Lower inference time means lower compute bills at 100 M+ monthly calls.

Try zerank-1 today

Experience sub-200 ms reranking with top-tier accuracy:

Give your search, agent or RAG pipeline the speed boost it needs.

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

Lightning-Fast Reranking with zerank-1

SHARE

What is a reranker and why you need one

Benchmark results

Why speed matters

When to use a reranker

Try zerank-1 today

Get started with

Get started with

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs