My AskAI Improves Support Agent Latency and Accuracy with ZeroEntropy

Sep 4, 2025

SHARE

Summary

My AskAI replaced its existing reranker with ZeroEntropy’s zerank‑1 across production traffic.

Results: faster responses at scale, a measurable lift in answer quality, and lower cost. After an A/B rollout in production with strong significance, My AskAI migrated 100 percent of rerank requests to ZeroEntropy.

Highlights

  • End‑to‑end migration of rerank traffic to ZeroEntropy

  • p50 173 ms, p90 240 ms, p99 352 ms over 113,878 real requests

  • Significant improvement on My AskAI’s answer‑quality metric, including a drop in “I don’t know” responses on several large customers

  • 25% cost reduction and the ability to scale candidate‑set size with predictable latency growth


“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”

— Alex Rainey, CTO, My AskAI

Company

My AskAI provides AI customer‑support agents that integrate with tools like Zendesk, Intercom, Gorgias, and Freshdesk. The product resolves, on average, 75% of all customer support tickets and can also gracefully escalate to human agents. They offer enterprise grade security with full GDPR compliance. On top of that, they’re one of the most cost effective solutions in the market, charging just $0.10 per support ticket handled.

Problem

My AskAI’s existing reranker introduced latency variance and tail latency spikes, limiting how many candidate chunks they could safely score per query. The team wanted to push throughput and improve answer quality without raising costs.

Constraints

  • Production traffic measured in tens of thousands of queries per day

  • Latency budgets for live support workflows

  • Need for straightforward integration and predictable scaling behavior

Approach

My AskAI ran an A/B in production: existing reranker vs ZeroEntropy zerank‑1. The experiment measured latency distributions, error rates, and internal success metrics such as the “I don’t know” rate.

Integration

ZeroEntropy is a drop‑in cross‑encoder reranker that sits after first‑stage retrieval. Migration involved swapping the rerank call in My AskAI’s retrieval pipeline.

from zeroentropy import ZeroEntropy

zclient = ZeroEntropy()
response = zclient.models.rerank(
    model="zerank-1",
    query="What’s the cancellation policy for my booking?",
    documents=
    [
        "Reservations are fully refundable if canceled at least 24h before the check-in date. ",
        "Cancellation policies vary depending on the type of reservation.",
        "Flexible Rate bookings may be canceled up to 6pm on the day prior to arrival ",
    ],
)

Scalability expectation

For a fixed model and hardware profile, rerank latency grows roughly O(N) with the number of candidate documents. This informed My AskAI’s plan to increase the candidate cap from 50 to 100 while watching p95 and p99.

Results

Latency in production

Over 113,878 requests

Metric

Latency

p50

173 ms

p90

240 ms

p99

352 ms

Quality

"Our key metric is AI resolution and AI CSAT" says Alex Rainey, CTO of My AskAI, "both of these were ~3% higher (absolute change). These may seem small, but we have a highly optimized AI support agent system, so gains like this are rare and usually come with a significant latency or cost impact."

Cost

A 25% cost reduction compared to My AskAI's prior provider; ZeroEntropy rerank pricing is 0.025 per million tokens.

Decision

MyAskAI moved all rerank requests to ZeroEntropy

“After running an A/B test in production, after only a few days we saw a statistically significant result. Along with the cost and latency improvements, this was a no-brainer decision.”
— Alex Rainey, Co‑founder, MyAskAI

Why ZeroEntropy

  1. Speed and tail control

    Consistent p50–p99 improvements made it possible to increase candidate set size without breaching SLOs. By reranking more documents, users got richer context and more accurate AI responses.


  2. Accuracy

    Cross‑encoder scoring trained with zELO pairwise ranking delivered a measurable lift, with a simple swap of an API call.


  3. Cost efficiency

    zerank-1's competitive pricing significantly lowered MyAskAI's cost, even while doubling the number tokens reranked.


  4. Roadmap fit

    Instruction‑following reranking and customer‑specific finetuning are planned.

Takeaways for technical leaders

  • If reranker tail latency limits your top k, a faster and cheaper cross‑encoder can immediately boost relevancy and accuracy.

  • A simple A/B in production, monitored on p95 or p99 and a single north‑star quality metric, is sufficient to make a confident migration

  • Cost wins often follow speed wins when pricing is token‑based

About ZeroEntropy

ZeroEntropy provides rerankers, embeddings, and an end‑to‑end retrieval engine. The zerank‑1 reranker is available via API, through our partner Baseten, and soon in the AWS Marketplace.

Get started by creating an API Key.

Contact founders@zeroentropy.dev for enterprise terms.

Get started with

ZeroEntropy Animation Gif
ZeroEntropy Animation Gif

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

RELATED ARTICLES
Abstract image of a dark background with blurry teal, blue, and pink gradients.