MRR (Mean Reciprocal Rank)

Also known as: mean reciprocal rank

TL;DR

Mean Reciprocal Rank is the average of 1/rank across queries, where rank is the position of the first relevant document. Heavily front-loaded — only the top result really matters.

MRR · MEAN RECIPROCAL RANKAverage of 1/rank across queries. Rank one wins big.QUERYRANK · 1 → 101 / RANKq1 · "vector db comparisons"101/1= 1.000q2 · "openai api pricing"1101/2= 0.500q3 · "rerank vs llm-as-judge"1101/4= 0.250q4 · "what is mteb"1101/8= 0.125q5 · "esoteric synonym search"1100= 0.000MRR = (1.00 + 0.50 + 0.25 + 0.13 + 0.00) / 50.375REWARD SHAPE · 1 / RANK12468100.000.250.500.751.00rank of first relevant1 / rank1.000.500.250.13Q5 MISSED · CONTRIBUTES 0each query: rank of the FIRST relevant document. Average the reciprocals.

MRR (Mean Reciprocal Rank) scores a ranking by the position of its first relevant document. For each query:

  • Find the rank of the first relevant document.
  • Take the reciprocal: .

Then average across queries. Position 1 contributes 1.0; position 2, 0.5; position 5, 0.2; no relevant doc, 0.

When to use MRR

MRR is most useful when the user only cares about the first relevant result. Classical use cases:

Top-1 consumers where MRR is the right metric
  • Question answering — one correct answer; if it’s at the top, you’re done
  • Navigational search — user is looking for a specific page
  • Customer support chatbots — one document answers the question
  • RAG with single-passage prompts where only the top result feeds the LLM
  • Tool-routing agents that pick exactly one tool per turn

For corpora where multiple documents are relevant per query and the user is browsing through them, is a better fit because it credits multiple relevant docs at the top, not just the first one.

For a query with exactly one relevant doc at rank , average precision is just — the reciprocal rank. So MAP and MRR coincide query-by-query in this regime, and a benchmark with mostly single-relevant queries (like much of MS MARCO) collapses the two metrics. This is why papers that headline MRR on MS MARCO are also implicitly headlining MAP, and why arguing about which metric is “better” is moot for that specific dataset shape.

MRR vs NDCG@K vs Recall@K

  • MRR: only first relevant doc, very front-loaded.
  • NDCG@K: all relevant docs in top-K, position-discounted.
  • Recall@K: any relevant doc in top-K, position-blind.

Report all three plus per-K curves. Each answers a different question; pick the one that matches your downstream consumer before optimizing.

Limitations

  • Binary by construction — MRR doesn’t use graded relevance. A “perfectly relevant” doc at position 1 and a “marginally relevant” doc at position 1 both contribute 1.0.
  • Sensitive to single document quality — one query where the relevant doc lands at position 50 (contributing 0.02) drags the mean significantly. Trim outliers or use median for a more robust read.
  • Doesn’t reward depth — your model could put one perfect result first and complete garbage at positions 2-10 and MRR wouldn’t notice. NDCG@10 would.
Go further

When should I prefer MRR over NDCG@10?

When your downstream consumer reads exactly one result — a QA system pulling the top passage, a routing agent picking one tool, a navigational search where the user clicks the first hit. NDCG is for ranked-list consumers; MRR is for top-1 consumers.

How is MRR sensitive to outliers?

Reciprocal rank decays slowly past position 10, so once the relevant doc is buried the per-query contribution is near-zero either way. But a single query where it lands at position 1 (1.0) versus position 2 (0.5) is a half-point swing. Trim long-tail queries or report median-RR alongside MRR for stability.

Does MRR work with graded relevance?

Not directly — MRR is binary by construction. If you have graded labels, NDCG@K is the natural fit. You can hack a graded MRR by treating only top-grade docs as 'relevant', but you're throwing away signal that NDCG would use.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord