✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

Latency Performance Assessment of zerank-2

Dec 9, 2025

TL;DR

zerank-2 delivers consistent, low-latency performance under realistic production conditions. In our testing, 97.3% of requests completed under 500ms with zero failures. This document presents our latency measurements and explains how to properly benchmark reranker performance.

Why Proper Latency Testing Matters

When evaluating reranker latency, it's critical that your testing reflects actual production usage patterns. Real user traffic doesn't arrive at uniform intervals. It comes in bursts and clusters. Testing with sequential requests or artificial patterns will give you misleading results that don't predict real-world performance.

Our tests use Poisson arrival patterns because they model the random, bursty nature of production traffic. This approach reveals how systems behave under realistic load conditions, including queueing effects and concurrent request handling.

Testing Methodology

All tests conducted using:

Poisson arrival patterns at 1-10 requests/second
60-second test duration
50 documents per request
Payload size ≤2KB per document

Performance Results

ZeRank-2 Latency Distribution

Latency Threshold	Requests Exceeding Threshold
>75ms	100.0%
>100ms	100.0%
>150ms	50.5%
>200ms	21.2%
>250ms	11.3%
>500ms	2.7%
>750ms	1.4%
>1s	0.9%
>3s	0.0%
>5s	0.0%
>10s	0.0%
>30s	0.0%
Failed	0.0%

Comparative Performance

Threshold	zerank-2	Cohere rerank-3.5	Jina reranker m0	Voyage rerank-2.5
>150ms	50.5%	34.3%	100.0%	80.5%
>500ms	2.7%	14.3%	70.8%	10.9%
>1s	0.9%	11.6%	57.4%	9.7%
>10s	0.0%	6.4%	55.7%	9.2%
Failed	0.0%	0.0%	55.7%	9.2%

Key Metrics

Zero failures across all test conditions
97.3% of requests completed under 500ms
99.1% of requests completed under 1 second
100% of requests completed under 3 seconds

zerank-2 maintains consistent performance across the entire latency distribution, with no requests exceeding 3 seconds.

Important Note on Rate Limits

When testing zerank-2, keep in mind that our API enforces rate limits to ensure fair resource allocation. If your usage exceeds 2,000,000 bytes per minute, requests will be moved to a slower processing queue, which will negatively impact the latency you observe.

For accurate latency testing, ensure your test traffic stays within these limits. If your production needs require higher rate limits, please contact us or join our Slack to discuss custom arrangements.

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

Latency Performance Assessment of zerank-2

SHARE

TL;DR

Why Proper Latency Testing Matters

Testing Methodology

Performance Results

ZeRank-2 Latency Distribution

Comparative Performance

Key Metrics

Important Note on Rate Limits

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking