Lightning-Fast Reranking with zerank-1

Jul 22, 2025

Lightning Fast Reranking with zerank-1
Lightning Fast Reranking with zerank-1
Lightning Fast Reranking with zerank-1
SHARE

Speed is the secret ingredient that makes great AI feel instant

What is a reranker and why you need one

A reranker is a cross-encoder neural model that takes a short list of candidate documents from a fast first-stage search (BM25, vector search or hybrid) and rescoring them with full query–document context. This second-pass step dramatically boosts precision in your top-k results, ensuring your LLM or user sees the most relevant snippets first.

Benchmark results

Model

NDCG@10

Latency (12 KB)

Latency (150 KB)

Jina rerank m0

0.7279

1 381.5 ms ± 2 082.2

4 543.8 ms ± 2 984.9

Cohere rerank 3.5

0.7091

171.5 ms ±   106.8

459.2 ms ±    87.9

ZeroEntropy zerank-1

0.7683

149.7 ms ±    53.1

314.4 ms ±    94.6

zerank-1 is:

  • ~12 % faster than Cohere 3.5 on small payloads (149.7 ms vs 171.5 ms)

  • ~31 % faster on large payloads (314.4 ms vs 459.2 ms)

  • 9× faster than Jina on 12 KB queries and 14× faster on 150 KB queries

All while delivering the highest NDCG@10 of the group.

Why speed matters

Whether you’re powering an enterprise search portal or a conversational voice agent, every millisecond counts. Here are some examples why:

  • RAG apps: Users expect sub-second results. Slow reranking means cold leads and frustrated employees.

  • Voice AI agents: Jitter in your pipeline breaks the illusion of a human-like dialogue. Quick reranking keeps the conversation flowing.

  • E-commerce search bars: Users only go through the top ~10 results which need to be very accurate, but every wasted millisecond can make them churn.

When to use a reranker

  • Tight LLM contexts: Surface the few most relevant documents so your prompt stays under token limits.

  • Precision-critical workflows: Legal search, medical Q&A or compliance use cases where every bit of relevance matters.

  • Cost-sensitive scale: Lower inference time means lower compute bills at 100 M+ monthly calls.

Try zerank-1 today

Experience sub-200 ms reranking with top-tier accuracy:

Give your search, agent or RAG pipeline the speed boost it needs.

Get started with

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

GitHub

Discord

Slack

Enterprise

Contact us for a custom enterprise solution with custom pricing

Get started with

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

GitHub

Discord

Slack

Enterprise

Contact us for a custom enterprise solution with custom pricing

Get started with

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

GitHub

Discord

Slack

Enterprise

Contact us for a custom enterprise solution with custom pricing