✨ Join the Context Engineers Discord community for an exclusive talk with the ZeroEntropy founders this Friday!

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

Oct 5, 2025

Hybrid Search as first-stage retrieval

If you’re building AI systems like RAG or AI Agents, you’re probably familiar with semantic search and keyword search concepts:

Keyword Search (BM25): lightning-fast inverted-index lookups, perfect for exact matches when you know what you're looking for (“try except syntax Python”), but recall drops when phrasing shifts (“how to catch errors in Python”).
Semantic Similarity: Nearest-neighbor vector search on precomputed vector embeddings. Much better at conceptual queries, as vectors are based on the semantic meaning of the content, rather than any particular keywords.

With turbopuffer, you can combine both retrieval methods in a hybrid setup and use Reciprocal Rank Fusion to boost recall.

But recall alone isn’t enough. You might retrieve the correct answer out of millions of documents, but if it sits at position 67, your LLM, or your users, will never see it.

Why add a reranker?

A reranker is a cross-encoder neural network that refines search results by reading both the query and each document together to evaluate how relevant they truly are.

By adding a reranker after fast retrieval (e.g., keyword, vector, or hybrid search), you let a context-aware model re-score and reorder the candidate results—surfacing the most relevant documents, reducing noise, and helping downstream models or users avoid missing the “needle in the haystack.”

Putting it all together

We can put all these components together to create a two-step search flow that delivers both speed and accuracy: turbopuffer handles fast retrieval, and ZeroEntropy refines the results to produce high-quality, context-aware answers.

In other words, turbopuffer finds the hay; ZeroEntropy finds the needle.

STEP 1: Retriever (turbopuffer) — the first-stage engine that stores documents and fetches a candidate set quickly:
- Keyword search (BM25): Searches for exact words and phrases, which is great for precise terms and filters.
- Semantic (Vector ANN): Searches by meaning using embeddings, which is helpful for synonyms and paraphrases.
- Hybrid (Reciprocal Rank Fusion): This merges the BM25 and ANN ranked lists into a single, stronger candidate list.
STEP 2: Reranker (ZeroEntropy): a cross-encoder that reads the query and each candidate together and assigns a relevance score, returning a better-ordered list.

Tutorial

Let’s jump into the implementation details!

See the full colab link here.

0. Environment setup

Install the core libraries needed for this tutorial:

turbopuffer — for fast document retrieval
zeroentropy — for reranking and relevance scoring
sentence-transformers — for generating text embeddings
python-dotenv — for securely loading API keys and environment variables

!pip -q install turbopuffer zeroentropy sentence-transformers python-dotenv

1. Configure API Keys

If you don't have keys yet:

turbopuffer: create an API key in your dashboard and set TURBOPUFFER_API_KEY.
ZeroEntropy: create an API key and set ZEROENTROPY_API_KEY.

You can paste them into the cell below while testing in Colab; for production, use environment variables.

import os
from dotenv import load_dotenv
load_dotenv()

# ↓ Optionally set directly for quick testing in Colab (replace the placeholders)
# os.environ['TURBOPUFFER_API_KEY'] = 'tpuf_...'
# os.environ['ZEROENTROPY_API_KEY'] = 'ze_...'

assert os.getenv('TURBOPUFFER_API_KEY') is not None, 'Please set TURBOPUFFER_API_KEY'
assert os.getenv('ZEROENTROPY_API_KEY') is not None, 'Please set ZEROENTROPY_API_KEY'
print('Keys detected ✅')

2. Create a turbopuffer namespace and write documents

We generate embeddings with sentence-transformers/all-MiniLM-L6-v2 (384 dims), then upsert text (BM25-enabled) + vector into turbopuffer.

from turbopuffer import Turbopuffer
from sentence_transformers import SentenceTransformer
import uuid, time

# Init SDK
tpuf = Turbopuffer(api_key=os.environ['TURBOPUFFER_API_KEY'], region="gcp-us-central1",)
ns_name = f'ze_tpuf_demo_{int(time.time())}'
ns = tpuf.namespace(ns_name)

# Tiny corpus
docs = [
  {
    "id": str(uuid.uuid4()),
    "title": "Henry Cavendish",
    "text": "In 1798, Cavendish used a torsion balance to measure the Earth's density and effectively determine the gravitational constant G. His experiment measured the tiny gravitational attraction between lead spheres."
  },
  {
    "id": str(uuid.uuid4()),
    "title": "Isaac Newton",
    "text": "Newton formulated the law of universal gravitation, F = G m1 m2 / r^2, explaining that gravitational force obeys an inverse-square dependence on distance."
  },
  {
    "id": str(uuid.uuid4()),
    "title": "Charles-Augustin de Coulomb",
    "text": "Coulomb used a torsion balance to study electrostatic forces, establishing Coulomb's law for charges, which also follows an inverse-square relationship."
  },
  {
    "id": str(uuid.uuid4()),
    "title": "Gravitational Potential Energy",
    "text": "The potential energy between two masses is U = -G m1 m2 / r; near Earth's surface U ≈ m g h."
  }
]


# Build vectors (local, no external API)
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
embs = model.encode([d['text'] for d in docs], normalize_embeddings=True)

# Upsert into Turbopuffer: text (BM25) + vector
rows = []
for d, v in zip(docs, embs):
    rows.append({
        'id': d['id'],
        'vector': v.tolist(),
        'title': d['title'],
        'text': d['text'],
    })

res = ns.write(
    upsert_rows=rows,
    distance_metric='cosine_distance',
    # Explicit schema to enable BM25 on `text`
    schema={
        'text': {'type': 'string', 'full_text_search': True},
        'title': {'type': 'string', 'full_text_search': True}
    }
)
print('Upserted:', res.rows_upserted, 'rows to', ns_name)

3. Hybrid search (BM25 + ANN) and Reciprocal Rank Fusion (RRF)

We run two queries against turbopuffer and fuse the rankings client-side using RRF.

from typing import List, Dict

query = "Who first measured the gravitational constant — not the person who formulated the law of universal gravitation?"

top_k = 5

# ANN vector search
vec = model.encode([query], normalize_embeddings=True)[0].tolist()
ann = ns.query(
    rank_by=("vector", "ANN", vec),
    top_k=top_k,
    include_attributes=["title","text"]
).rows

# BM25 search
bm25 = ns.query(
    rank_by=("text", "BM25", query),
    top_k=top_k,
    include_attributes=["title","text"]
).rows

def rrf_with_scores(list_of_rankings: List[List[Dict]], k: int = 60):
    """Return (fused_list, rrf_scores_dict)."""
    scores = {}
    for ranking in list_of_rankings:
        for rank, item in enumerate(ranking, start=1):
            scores[item.id] = scores.get(item.id, 0.0) + 1.0 / (k + rank)
    id2item = {item.id: item for ranking in list_of_rankings for item in ranking}
    ordered_ids = sorted(scores, key=scores.get, reverse=True)
    fused = [id2item[i] for i in ordered_ids]
    return fused, scores

# Fuse and print only RRF results
fused_results, rrf_scores = rrf_with_scores([ann, bm25])
fused_results = fused_results[:top_k]

for i, r in enumerate(fused_results, 1):
    print(f"Rank {i}. {r.title} — rrf_score={rrf_scores[r.id]:.6f}")

4. Rerank with ZeroEntropy

Then we pass the fused candidate list to ZeroEntropy for high-accuracy reranking.

import os
from zeroentropy import ZeroEntropy

def zeroentropy_rerank_or_unranked(results, query, model="zerank-1-small", k=None, return_scores=False):
    """Return results reordered by ZeroEntropy; optionally include scores."""
    if not os.getenv("ZEROENTROPY_API_KEY"):
        print("Warning: ZEROENTROPY_API_KEY not set, returning unranked results")
        return results

    ze = ZeroEntropy(api_key=os.environ["ZEROENTROPY_API_KEY"])
    documents = [getattr(r, "text", str(r)) for r in results]  # list[str]
    out = ze.models.rerank(
        query=query,
        documents=documents,
        model=model,                 # "zerank-1" or "zerank-1-small"
        top_n=k or len(documents),
    )

    # out.results is sorted by relevance (desc)
    if return_scores:
        return [(results[r.index], r.relevance_score) for r in out.results]
    else:
        return [results[r.index] for r in out.results]

# Use it
reranked_with_scores = zeroentropy_rerank_or_unranked(
    fused_results, query, model="zerank-1-small", return_scores=True
)

for rank, (doc, score) in enumerate(reranked_with_scores, 1):
    print(f"Rank {rank}. {doc.title} — {score:.4f}")

Results

Hybrid Search without ZeroEntropy reranker:

Rewards overlap on “gravity”/“inverse square,” mistakenly ranking Newton #1.

1) Newton 0.0328, 2) Cavendish 0.0323, 3) Coulomb 0.0317, 4) GPE 0.0156

Hybrid Search with ZeroEntropy reranker:

Reads query + candidate together, understands negation and measurement, correctly ranking Cavendish #1.

1) Cavendish 0.9531, 2) Newton 0.4428, 3) GPE 0.3901, 4) Coulomb 0.2304

Why ZeroEntropy wins

RRF blends keyword and semantic similarity, so anything about “gravity” and “inverse square” tends to float to the top—even Newton, who is explicitly excluded by the question.

ZeroEntropy evaluates the full meaning of the query with each candidate. It recognizes that the user wants the person who measured G (Cavendish) and not the person who formulated the law (Newton), so it reorders the list correctly.

Takeaway

A two-step search flow delivers both speed and accuracy: turbopuffer finds the hay; ZeroEntropy finds the needle.

Get started with

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Our retrieval engine runs autonomously with the

accuracy of a human-curated system.

Start Now

View Docs

GitHub

Discord

Slack

Enterprise

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

AGI requires better retrieval, not just better LLMs

Dec 2, 2024

AGI needs more than LLMs—it needs smarter retrieval. Learn how to identify failure modes in RAG and evaluate search accuracy with ZeroEntropy’s benchmarks.

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

Nov 29, 2024

LegalBench-RAG is the first open-source benchmark for legal RAG retrieval—6,800+ queries, 79M+ characters, human-annotated spans. Evaluate legal AI today.

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

Dec 1, 2024

Learn how LlamaChunk delivers fast, accurate semantic chunking for RAG—outperforming regex and embedding methods with LLM-guided document splitting.

Abstract image of a dark background with blurry teal, blue, and pink gradients.

Implementing ZeroEntropy Reranking with turbopuffer Retrieval

SHARE

Hybrid Search as first-stage retrieval

Why add a reranker?

Putting it all together

Tutorial

0. Environment setup

1. Configure API Keys

2. Create a turbopuffer namespace and write documents

3. Hybrid search (BM25 + ANN) and Reciprocal Rank Fusion (RRF)

4. Rerank with ZeroEntropy

Results

Why ZeroEntropy wins

Takeaway

Get started with

RELATED ARTICLES

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking

AGI requires better retrieval, not just better LLMs

AGI requires better retrieval, not just better LLMs

LegalBench-RAG, the First Open-Source Retrieval Benchmark for the Legal Domain

LlamaChunk: A General and Cost Efficient Approach to Semantic Chunking