✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Sep 4, 2025

Back to Blog

Summary

My AskAI replaced its existing reranker with ZeroEntropy’s zerank‑1 across production traffic.

Results: faster responses at scale, a measurable lift in answer quality, and lower cost.

After an A/B rollout in production with strong significance, My AskAI migrated 100 percent of rerank requests to ZeroEntropy.

“We ran an A/B test in production, and after only a few days we saw a statistically significant accuracy bump. Along with the cost and latency improvements, this was a no-brainer decision.”
— Alex Rainey, CTO, My AskAI

Highlights:

End‑to‑end migration of rerank traffic to ZeroEntropy
p50 173 ms, p90 240 ms, p99 352 ms over 113,878 real requests
Significant improvement on My AskAI’s answer‑quality metric, including a drop in “I don’t know” responses on several large customers
25% cost reduction and the ability to scale candidate‑set size with predictable latency growth

Company

My AskAI provides AI customer‑support agents that integrate with tools like Zendesk, Intercom, Gorgias, and Freshdesk. The product resolves, on average, 75% of all customer support tickets and can also gracefully escalate to human agents. They offer enterprise-grade security with full GDPR compliance. On top of that, they’re one of the most cost effective solutions in the market, charging just $0.10 per support ticket handled.

Problem

My AskAI’s existing reranker introduced latency variance and tail latency spikes, limiting how many candidate chunks they could safely score per query. The team wanted to push throughput and improve answer quality without raising costs.

Constraints

Production traffic measured in tens of thousands of queries per day
Latency budgets for live support workflows
Need for straightforward integration and predictable scaling behavior

Approach

My AskAI ran an A/B in production: existing reranker vs ZeroEntropy zerank‑1. The experiment measured latency distributions, error rates, and internal success metrics such as the “I don’t know” rate.

Integration

ZeroEntropy is a drop‑in cross‑encoder reranker that sits after first‑stage retrieval. Migration involved swapping the rerank call in My AskAI’s retrieval pipeline.

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()
response = zclient.models.rerank(
    model="zerank-1",
    query="What’s the cancellation policy for my booking?",
    documents=
    [
        "Reservations are fully refundable if canceled at least 24h before the check-in date. ",
        "Cancellation policies vary depending on the type of reservation.",
        "Flexible Rate bookings may be canceled up to 6pm on the day prior to arrival ",
    ],
)

Scalability expectation

For a fixed model and hardware profile, rerank latency grows roughly O(N) with the number of candidate documents. This informed My AskAI’s plan to increase the candidate cap from 50 to 100 while watching p95 and p99.

Results

Latency in production

Over 113,878 requests

Metric	Latency
p50	173 ms
p90	240 ms
p99	352 ms

Quality

"Our key metric is AI resolution and AI CSAT" says Alex Rainey, CTO of My AskAI, "both of these were ~3% higher (absolute change). These may seem small, but we have a highly optimized AI support agent system, so gains like this are rare and usually come with a significant latency or cost impact."

Cost

A 25% cost reduction compared to My AskAI's prior provider; ZeroEntropy rerank pricing is 0.025 per million tokens.

Decision

My AskAI moved all rerank requests to ZeroEntropy.

“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”
— Alex Rainey, Co‑founder, MyAskAI

Why ZeroEntropy

Speed and tail control
Consistent p50–p99 improvements made it possible to increase candidate set size without breaching SLOs. By reranking more documents, users got richer context and more accurate AI responses.
Accuracy
Cross‑encoder scoring trained with zELO pairwise ranking delivered a measurable lift, with a simple swap of an API call.
Cost efficiency
zerank-1's competitive pricing significantly lowered MyAskAI's cost, even while doubling the number tokens reranked.
Roadmap fit
Instruction‑following reranking and customer‑specific finetuning are planned.

Takeaways for technical leaders

If reranker tail latency limits your top k, a faster and cheaper cross‑encoder can immediately boost relevancy and accuracy.
A simple A/B in production, monitored on p95 or p99 and a single north‑star quality metric, is sufficient to make a confident migration
Cost wins often follow speed wins when pricing is token‑based

About ZeroEntropy

ZeroEntropy provides rerankers, embeddings, and an end‑to‑end retrieval engine. The zerank‑1 reranker is available via API, through our partner Baseten, and soon in the AWS Marketplace.

Get started by creating an API Key.

Contact founders@zeroentropy.dev for enterprise terms.

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

SHARE

Summary

Highlights:

Company

Problem

Constraints

Approach

Integration

Scalability expectation

Results

Latency in production

Quality

Cost

Decision

“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, Co‑founder, MyAskAI

Why ZeroEntropy

Takeaways for technical leaders

About ZeroEntropy

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking