Inverted Index

Also known as: postings list, term index, keyword index

TL;DR

An inverted index maps each term to the list of documents (and positions) where it appears. The classical data structure behind keyword search — sub-millisecond lookups over billions of documents and the substrate every BM25 implementation builds on.

An inverted index is the data structure that makes keyword search fast. For every term in the corpus vocabulary, it stores a postings list — the documents that contain that term, plus per-document metadata (term frequency, positions, field).

INVERTED INDEXFrom documents to a term-to-postings map.D1DOC 1the1cat2sat3on4the5mat6D2DOC 2the1dog2sat3on4the5rug6D3DOC 3cats1and2dogs3TOKENIZE · LOWERCASE · STREAM POSTINGSTERMPOSTINGS · doc_id : positionstheD1:[1,5]D2:[1,5]catD1:[2]satD1:[3]D2:[3]onD1:[4]D2:[4]matD1:[6]dogD2:[2]rugD2:[6]catsD3:[1]andD3:[2]dogsD3:[3]THREE DOCUMENTS

A toy example. Indexing three documents:

  • D1: “the cat sat on the mat”
  • D2: “the dog sat on the rug”
  • D3: “cats and dogs”

Yields postings:

  • cat → [(D1, tf=1, pos=[1])]
  • sat → [(D1, tf=1), (D2, tf=1)]
  • dog → [(D2, tf=1)]
  • the → [(D1, tf=2), (D2, tf=2)]

A query for cat sat reads the two postings lists, intersects them on doc ID, and returns D1. No document is ever scanned.

Why this is the substrate of lexical retrieval

Every classical retrieval algorithm — Boolean, TF-IDF, — is a scoring function applied to postings-list intersections. A single inverted index serves Boolean filters, BM25 ranking, phrase search, and faceted search simultaneously, because the index just delivers postings; the scoring function is a thin layer above.

Modern open-source implementations store postings in compressed blocks (variable-byte, Roaring bitmaps, FOR-delta), so a billion-doc index fits in tens of GB and decodes at GB/s. That is what gives its sub-millisecond budget over corpora that wouldn’t fit in vector RAM.

What’s in a posting

Beyond the doc ID, each posting typically carries:

  • Term frequency (tf) — used by and BM25 scoring.
  • Positions — for phrase and proximity queries.
  • Field — title vs body vs URL, so you can boost matches in titles.
  • Payload — arbitrary per-occurrence data (sentiment, language, custom weights).

Storing positions roughly doubles index size; many production systems index positions only for fields that actually need phrase queries.

Lucene encodes each postings list as a sequence of compressed blocks of 128 doc-IDs. The doc-IDs are delta-encoded (only the difference from the previous ID is stored), then run through Frame-of-Reference + PFOR-Delta compression, which packs each block into the minimum number of bits needed for its largest delta. Term frequencies are stored in a parallel block, positions in another. The result is roughly 2 to 4 bits per posting on web-scale corpora, low enough that postings for hot terms stay in OS page cache. Tantivy and Quickwit use the same block layout with minor codec tweaks; modern engines diverge mostly on impact-ordering vs doc-ID-ordering, not on compression.

Hybrid stacks ( ) run an inverted-index query and a query in parallel, then fuse the two ranked lists. The inverted index handles rare exact tokens — model numbers, proper nouns, identifiers — that embeddings systematically underweight. The ANN handles paraphrases and synonyms the index can’t see. Two failure modes, two complementary indexes.

What modern lexical engines change

Beyond compression, recent inverted-index engines add:

  • Block-Max WAND — early-termination of postings traversal once no remaining doc can beat the K-th score.
  • Impact-ordered postings — sort each list by contribution rather than doc ID, so top-K can stop after a few percent of the postings.
  • Tiered storage — hot terms in RAM, cold in NVMe, frozen on object storage.

These are why a single-machine BM25 over 100M docs still serves p99 of about 30ms.

Production inverted-index engines
  • Lucene — the reference implementation, embedded in Elasticsearch and OpenSearch; battle-tested across two decades of production search.
  • Tantivy — Rust port with comparable compression and faster ingestion; used by Quickwit and Meilisearch internals.
  • Vespa — disk-oriented engine designed around tiered storage and structured data; Yahoo’s heritage retrieval stack.
  • Pisa — research-grade engine that pushes Block-Max WAND and impact-ordered postings to their measurement-theoretic limits.
  • DuckDB FTS and SQLite FTS5 — embedded inverted indexes for analytics and small-corpus use; surprisingly capable up to a few million docs.
Go further

Why is the inverted index so much faster than a forward scan?

A forward scan visits every document and asks 'do you contain term T?'. An inverted index already has the answer pre-computed — it jumps straight to the postings list for T. For a 100-million-document corpus that's the difference between hours and microseconds per query.

How does the index handle phrase queries and proximity?

Each posting stores the term's positions inside the document, not just a doc ID. To match the phrase 'reciprocal rank fusion' you intersect the postings lists for all three terms and keep only documents where their positions are consecutive. Proximity scoring (terms within N tokens) is the same idea with a wider window.

What does a learned-sparse model change about the index?

Models like SPLADE expand a query or document into a weighted bag of terms — many of which never appeared in the original text. The inverted-index machinery is unchanged; the postings list just gets fed model-generated terms with model-generated weights instead of raw token counts.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord