✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

Sep 18, 2025

“Reranking is a crucial step of retrieval, and zerank-1 was the first reranker we tried that was actually accurate, but also fast and calibrated.”
— Deshraj Yadav, CTO at Mem0

Company and Highlights

Mem0 is the universal memory layer trusted by 50,000+ developers to power AI agents across healthcare, enterprise, education, and more. Memory is the key intelligence layer that lets AI recall facts, learn over time, and deliver personalization.

Mem0 migrated their production rerank traffic to ZeroEntropy’s zerank-1, a critical component of their retrieval stack. With ZeroEntropy, they get more calibrated scores, consistent latency distributions, and throughput at scale, now processing over 1B tokens per day with predictable performance.

Throughput: around 1B tokens per day
Production latency: p50 75 ms, p90 125 ms, p99 238 ms
Predictable scaling across candidate set sizes
Simple API swap for integration
SOC 2 and HIPAA compliance

Problem

Mem0 powers AI Agents across industries, at scale. These agents rely on Mem0’s memory layer to surface the right facts in real time. As usage scaled, Mem0’s previous reranker became a bottleneck. Two problems kept surfacing:

Noisy retrievals across verticals. Inconsistent scoring made it difficult to set thresholds that worked equally well for healthcare assistants, enterprise copilots, and consumer chatbots. What looked relevant in one domain often failed in another.
Unpredictable latency. At high load, tail latencies spiked, breaking the seamless, real-time experience users expect from AI agents.

For a product that is critical inside thousands of AI applications, brittle memory was not an option. Mem0 needed a reranker that could handle billions of tokens a day with enterprise-grade reliability.

Approach

Mem0 tested ZeroEntropy’s zerank-1 reranker in a sandbox environment locally, thanks to the open-weights available on HuggingFace:

Benchmarked against internal metrics of accuracy, and calibration stability.
Evaluated impact on downstream customer use cases (retrieval accuracy, personalization fidelity, token savings).

After confirming superior accuracy metrics, Mem0 started integrated ZeroEntropy’s API for production scale. Migration required a single API swap within Mem0’s retrieval-and-memory compression pipeline.

Results

Latency in Production

Calibration & Accuracy

Scores were stable across domains, making thresholding simpler and improving retrieval consistency.
Higher relevance fidelity translated into stronger personalization and context recall.

Scale

Mem0 now processes over 1B tokens per day through ZeroEntropy rerankers with consistent SLO adherence.
Predictable O(N) scaling allows Mem0 to increase candidate sets without breaching latency budgets.

Decision

Mem0 migrated production rerank traffic to ZeroEntropy, making our reranker a critical part of the memory infrastructure trusted by 50,000+ developers and enterprises worldwide.

“ZeroEntropy made it possible for us to deliver deliver high accuracy retrieval for our memory retrieval pipeline at scale”.
— Deshraj Yadav, CTO at Mem0

Why ZeroEntropy

Latency & Tail Control: Stable p50–p99 latencies even at high throughput.
Calibration: calibrated performance holds across diverse domains and workloads.
Cost Efficiency: Token-based pricing aligned with Mem0’s usage model.
Drop-In Integration: Minimal engineering lift for production rollout.

Why it matters

As AI agents spread across verticals, memory and retrieval become mission-critical. Mem0 chose ZeroEntropy because only accurate, calibrated, and low-latency rerankers can power personalized AI at this scale.

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

Mem0 Improves Memory Retrieval Accuracy with ZeroEntropy

SHARE

Company and Highlights

Problem

Approach

Results

Decision

Why ZeroEntropy

Why it matters

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking