✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

On The Geometric Limit of Dense Single Vector Embeddings

Sep 6, 2025

Embeddings are not all you need

Single-vector embeddings are great for fast recall. They are not enough for correctness at the decision boundary. New results from Google DeepMind make the reason precise, and our own experiments on LIMIT show how a cross-encoder reranker fixes it.

The core geometric limit

Think of a document as a superposition of many tiny facts, call them nuggets. A 3072-dimensional vector must place that document at one point. Now ask a query that targets one nugget among millions. You want all documents that share that nugget to be nearest neighbors. In general you cannot orient every document cloud so that all possible nugget-defined neighborhoods are realized by a single dot-product in dimension d. Some top-k sets are simply not realizable under cosine similarity in d dimensions.

DeepMind’s LIMIT paper proves this family of impossibility results and instantiates a dataset where even simple queries expose the failure. They show that for any fixed d, there exist combinations of documents that no query vector can select as the exact top-k under cosine. On LIMIT, state-of-the-art embedding models underperform sharply, even though the language in the queries is trivial.

Hugging Face’s dataset card summarizes this behavior succinctly, noting that SOTA embeddings score under 20 percent recall at 100 on the full LIMIT benchmark and cannot solve the tiny 46-doc LIMIT-small variant.

An intuition in one line

Place four documents on a 1D number line with values v1 > v2 > v3 > v4. For any positive query u, dot products preserve order, so top-2 is {v1, v2}. For any negative u, the order reverses, so top-2 is {v4, v3}. Sets like {v1, v3} never appear as top-2 for any u. This impossibility persists in higher dimensions with more complex combinations.

Why reranking is the remedy

A cross-encoder reranker scores pairs (query, candidate) directly. It is not bound by a single point in d dimensions, so it can model arbitrary interactions between query instructions and chunk content. In practice, the winning recipe is:

Use dense or hybrid first-stage retrieval for speed and coverage.
Rerank the top N with a cross-encoder for precision.

LIMIT makes this especially clear because the queries are simple. When first-stage recall hits a ceiling from geometry, reranking unlocks the relevant combinations that dense retrieval cannot realize.

Our experiment on LIMIT

Setup. We ran the official LIMIT data release in MTEB format, full set with 50k documents and 1000 queries, following the repository instructions. First stage used FAISS over OpenAI text-embedding-3-small vectors. We retrieved top 100 per query, then reranked with zerank-1 and computed recall at k. Dataset, code pointers, and format are in the public repo and cards.

Baseline recall with embeddings only (text-embedding-3-small, direct cosine on the index):

Metric	Recall
Recall@1	0.0135
Recall@5	0.0285
Recall@10	0.0325
Recall@20	0.0435

After reranking the top 100 with zerank-1:

Metric	Recall
Recall@1	0.131
Recall@5	0.283
Recall@10	0.625
Recall@20	0.835

Notes: full LIMIT split, FAISS cosine search, top-100 handoff to cross-encoder. The LIMIT paper and dataset cards independently report that single-vector embeddings struggle on LIMIT, which aligns with our baseline.

What this means for real systems

Do not chase dimension alone. Increasing d helps until geometry bites again. LIMIT shows a structural ceiling for single-vector top-k sets.
Adopt a two-stage pipeline. Dense or multi-vector retrieval for speed, cross-encoder reranking for correctness.
Cover multiple neighborhoods. Use query rewriting, metadata filters, and multiple retrieval heads when data stratifies into clusters. The reranker becomes the arbiter that aggregates across sources.
Evaluate beyond generic leaderboards. Use LIMIT and task-specific tests. MTEB is great for breadth, LIMIT is great for probing geometric edge cases.

Practical recipe

Index with a strong embedding model, optionally hybrid with BM25.
Retrieve top 100 to 200 for head queries, 300 to 500 for tail queries.
Rerank with a cross-encoder like zerank-1.
Calibrate thresholds on held-out traffic, not synthetic prompts.
Track recall at k on adversarial sets like LIMIT in CI.

Why this matters now

Modern enterprise queries are instruction heavy and multi-intent. A single vector must live in several neighborhoods at once. Geometry resists. Cross-encoder reranking removes that bottleneck, which is why it delivers large recall gains on LIMIT, even when first-stage recall looks stuck.

If you want to reproduce our setup or run LIMIT in your own stack, start from the paper and repo, then load the dataset from Hugging Face.

References

Weller, Boratko, Naim, Lee. On the Theoretical Limitations of Embedding-Based Retrieval, arXiv, Aug 28, 2025.
LIMIT dataset and code, Google DeepMind GitHub.
LIMIT and LIMIT-small on Hugging Face.
MTEB background.

Are you reranking your candidates yet? If you want multi-query embeddings research updates or a drop-in zerank-1 eval on your data, reach out at founders@zeroentropy.dev

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

On The Geometric Limit of Dense Single Vector Embeddings

SHARE

Embeddings are not all you need

The core geometric limit

An intuition in one line

Why reranking is the remedy

Our experiment on LIMIT

What this means for real systems

Practical recipe

Why this matters now

References

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking