SHARE
Summary
My AskAI replaced its existing reranker with ZeroEntropy’s zerank‑1 across production traffic.
Results: faster responses at scale, a measurable lift in answer quality, and lower cost. After an A/B rollout in production with strong significance, My AskAI migrated 100 percent of rerank requests to ZeroEntropy.
Highlights
End‑to‑end migration of rerank traffic to ZeroEntropy
p50 173 ms, p90 240 ms, p99 352 ms over 113,878 real requests
Significant improvement on My AskAI’s answer‑quality metric, including a drop in “I don’t know” responses on several large customers
25% cost reduction and the ability to scale candidate‑set size with predictable latency growth
“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”
— Alex Rainey, CTO, My AskAI
Company
My AskAI provides AI customer‑support agents that integrate with tools like Zendesk, Intercom, Gorgias, and Freshdesk. The product resolves, on average, 75% of all customer support tickets and can also gracefully escalate to human agents. They offer enterprise grade security with full GDPR compliance. On top of that, they’re one of the most cost effective solutions in the market, charging just $0.10 per support ticket handled.

Problem
My AskAI’s existing reranker introduced latency variance and tail latency spikes, limiting how many candidate chunks they could safely score per query. The team wanted to push throughput and improve answer quality without raising costs.
Constraints
Production traffic measured in tens of thousands of queries per day
Latency budgets for live support workflows
Need for straightforward integration and predictable scaling behavior
Approach
My AskAI ran an A/B in production: existing reranker vs ZeroEntropy zerank‑1. The experiment measured latency distributions, error rates, and internal success metrics such as the “I don’t know” rate.
Integration
ZeroEntropy is a drop‑in cross‑encoder reranker that sits after first‑stage retrieval. Migration involved swapping the rerank call in My AskAI’s retrieval pipeline.
Scalability expectation
For a fixed model and hardware profile, rerank latency grows roughly O(N) with the number of candidate documents. This informed My AskAI’s plan to increase the candidate cap from 50 to 100 while watching p95 and p99.
Results
Latency in production
Over 113,878 requests
Metric | Latency |
---|---|
p50 | 173 ms |
p90 | 240 ms |
p99 | 352 ms |
Quality
"Our key metric is AI resolution and AI CSAT" says Alex Rainey, CTO of My AskAI, "both of these were ~3% higher (absolute change). These may seem small, but we have a highly optimized AI support agent system, so gains like this are rare and usually come with a significant latency or cost impact."
Cost
A 25% cost reduction compared to My AskAI's prior provider; ZeroEntropy rerank pricing is 0.025 per million tokens.
Decision
MyAskAI moved all rerank requests to ZeroEntropy
“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”
— Alex Rainey, Co‑founder, MyAskAI
Why ZeroEntropy
Speed and tail control
Consistent p50–p99 improvements made it possible to increase candidate set size without breaching SLOs. By reranking more documents, users got richer context and more accurate AI responses.
Accuracy
Cross‑encoder scoring trained with zELO pairwise ranking delivered a measurable lift, with a simple swap of an API call.
Cost efficiency
zerank-1's competitive pricing significantly lowered MyAskAI's cost, even while doubling the number tokens reranked.
Roadmap fit
Instruction‑following reranking and customer‑specific finetuning are planned.
Takeaways for technical leaders
If reranker tail latency limits your top k, a faster and cheaper cross‑encoder can immediately boost relevancy and accuracy.
A simple A/B in production, monitored on p95 or p99 and a single north‑star quality metric, is sufficient to make a confident migration
Cost wins often follow speed wins when pricing is token‑based
About ZeroEntropy
ZeroEntropy provides rerankers, embeddings, and an end‑to‑end retrieval engine. The zerank‑1 reranker is available via API, through our partner Baseten, and soon in the AWS Marketplace.
Get started by creating an API Key.
Contact founders@zeroentropy.dev for enterprise terms.
Get started with
RELATED ARTICLES
