Latency Performance Assessment of zerank-2

Dec 9, 2025

SHARE

TL;DR

zerank-2 delivers consistent, low-latency performance under realistic production conditions. In our testing, 97.3% of requests completed under 500ms with zero failures. This document presents our latency measurements and explains how to properly benchmark reranker performance.

Why Proper Latency Testing Matters

When evaluating reranker latency, it's critical that your testing reflects actual production usage patterns. Real user traffic doesn't arrive at uniform intervals. It comes in bursts and clusters. Testing with sequential requests or artificial patterns will give you misleading results that don't predict real-world performance.

Our tests use Poisson arrival patterns because they model the random, bursty nature of production traffic. This approach reveals how systems behave under realistic load conditions, including queueing effects and concurrent request handling.

Testing Methodology

All tests conducted using:

  • Poisson arrival patterns at 1-10 requests/second

  • 60-second test duration

  • 50 documents per request

  • Payload size ≤2KB per document

Performance Results

ZeRank-2 Latency Distribution

Latency Threshold

Requests Exceeding Threshold

>75ms

100.0%

>100ms

100.0%

>150ms

50.5%

>200ms

21.2%

>250ms

11.3%

>500ms

2.7%

>750ms

1.4%

>1s

0.9%

>3s

0.0%

>5s

0.0%

>10s

0.0%

>30s

0.0%

Failed

0.0%

Comparative Performance

Threshold

zerank-2

Cohere rerank-3.5

Jina reranker m0

Voyage rerank-2.5

>150ms

50.5%

34.3%

100.0%

80.5%

>500ms

2.7%

14.3%

70.8%

10.9%

>1s

0.9%

11.6%

57.4%

9.7%

>10s

0.0%

6.4%

55.7%

9.2%

Failed

0.0%

0.0%

55.7%

9.2%

Key Metrics

  • Zero failures across all test conditions

  • 97.3% of requests completed under 500ms

  • 99.1% of requests completed under 1 second

  • 100% of requests completed under 3 seconds

zerank-2 maintains consistent performance across the entire latency distribution, with no requests exceeding 3 seconds.

Important Note on Rate Limits

When testing zerank-2, keep in mind that our API enforces rate limits to ensure fair resource allocation. If your usage exceeds 2,000,000 bytes per minute, requests will be moved to a slower processing queue, which will negatively impact the latency you observe.

For accurate latency testing, ensure your test traffic stays within these limits. If your production needs require higher rate limits, please contact us or join our Slack to discuss custom arrangements.

Get started with

ZeroEntropy Animation Gif
ZeroEntropy Animation Gif

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

RELATED ARTICLES
Abstract image of a dark background with blurry teal, blue, and pink gradients.