On The Geometric Limit of Dense Single Vector Embeddings

Sep 6, 2025

SHARE


Embeddings are not all you need

Single-vector embeddings are great for fast recall. They are not enough for correctness at the decision boundary. New results from Google DeepMind make the reason precise, and our own experiments on LIMIT show how a cross-encoder reranker fixes it.

The core geometric limit

Think of a document as a superposition of many tiny facts, call them nuggets. A 3072-dimensional vector must place that document at one point. Now ask a query that targets one nugget among millions. You want all documents that share that nugget to be nearest neighbors. In general you cannot orient every document cloud so that all possible nugget-defined neighborhoods are realized by a single dot-product in dimension d. Some top-k sets are simply not realizable under cosine similarity in d dimensions.

DeepMind’s LIMIT paper proves this family of impossibility results and instantiates a dataset where even simple queries expose the failure. They show that for any fixed d, there exist combinations of documents that no query vector can select as the exact top-k under cosine. On LIMIT, state-of-the-art embedding models underperform sharply, even though the language in the queries is trivial. 

Hugging Face’s dataset card summarizes this behavior succinctly, noting that SOTA embeddings score under 20 percent recall at 100 on the full LIMIT benchmark and cannot solve the tiny 46-doc LIMIT-small variant. 

An intuition in one line

Place four documents on a 1D number line with values v1 > v2 > v3 > v4. For any positive query u, dot products preserve order, so top-2 is {v1, v2}. For any negative u, the order reverses, so top-2 is {v4, v3}. Sets like {v1, v3} never appear as top-2 for any u. This impossibility persists in higher dimensions with more complex combinations.

Why reranking is the remedy

A cross-encoder reranker scores pairs (query, candidate) directly. It is not bound by a single point in d dimensions, so it can model arbitrary interactions between query instructions and chunk content. In practice, the winning recipe is:

  1. Use dense or hybrid first-stage retrieval for speed and coverage.

  2. Rerank the top N with a cross-encoder for precision.

LIMIT makes this especially clear because the queries are simple. When first-stage recall hits a ceiling from geometry, reranking unlocks the relevant combinations that dense retrieval cannot realize. 

Our experiment on LIMIT

Setup. We ran the official LIMIT data release in MTEB format, full set with 50k documents and 1000 queries, following the repository instructions. First stage used FAISS over OpenAI text-embedding-3-small vectors. We retrieved top 100 per query, then reranked with zerank-1 and computed recall at k. Dataset, code pointers, and format are in the public repo and cards. 

Baseline recall with embeddings only (text-embedding-3-small, direct cosine on the index):

Metric

Recall

Recall@1

0.0135

Recall@5

0.0285

Recall@10

0.0325

Recall@20

0.0435

After reranking the top 100 with zerank-1:

Metric

Recall

Recall@1

0.131

Recall@5

0.283

Recall@10

0.625

Recall@20

0.835

Notes: full LIMIT split, FAISS cosine search, top-100 handoff to cross-encoder. The LIMIT paper and dataset cards independently report that single-vector embeddings struggle on LIMIT, which aligns with our baseline. 

What this means for real systems

  1. Do not chase dimension alone. Increasing d helps until geometry bites again. LIMIT shows a structural ceiling for single-vector top-k sets. 

  2. Adopt a two-stage pipeline. Dense or multi-vector retrieval for speed, cross-encoder reranking for correctness.

  3. Cover multiple neighborhoods. Use query rewriting, metadata filters, and multiple retrieval heads when data stratifies into clusters. The reranker becomes the arbiter that aggregates across sources.

  4. Evaluate beyond generic leaderboards. Use LIMIT and task-specific tests. MTEB is great for breadth, LIMIT is great for probing geometric edge cases. 

Practical recipe

  • Index with a strong embedding model, optionally hybrid with BM25.

  • Retrieve top 100 to 200 for head queries, 300 to 500 for tail queries.

  • Rerank with a cross-encoder like zerank-1.

  • Calibrate thresholds on held-out traffic, not synthetic prompts.

  • Track recall at k on adversarial sets like LIMIT in CI.

Why this matters now

Modern enterprise queries are instruction heavy and multi-intent. A single vector must live in several neighborhoods at once. Geometry resists. Cross-encoder reranking removes that bottleneck, which is why it delivers large recall gains on LIMIT, even when first-stage recall looks stuck.

If you want to reproduce our setup or run LIMIT in your own stack, start from the paper and repo, then load the dataset from Hugging Face. 

References

  • Weller, Boratko, Naim, Lee. On the Theoretical Limitations of Embedding-Based Retrieval, arXiv, Aug 28, 2025. 

  • LIMIT dataset and code, Google DeepMind GitHub. 

  • LIMIT and LIMIT-small on Hugging Face. 

  • MTEB background. 

Are you reranking your candidates yet? If you want multi-query embeddings research updates or a drop-in zerank-1 eval on your data, reach out at founders@zeroentropy.dev

Get started with

ZeroEntropy Animation Gif
ZeroEntropy Animation Gif

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the 

accuracy of a human-curated system.

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

Contact us for a custom enterprise solution with custom pricing

RELATED ARTICLES
Abstract image of a dark background with blurry teal, blue, and pink gradients.