HNSW

Also known as: Hierarchical Navigable Small World, HNSW graph, hnswlib

TL;DR

HNSW (Hierarchical Navigable Small World) is the dominant graph-based ANN algorithm. A multi-layer proximity graph supports log-time approximate search by greedy walks at each layer.

HNSW (Hierarchical Navigable Small World) is the graph-based algorithm that powers most modern vector databases. It builds a multi-layer proximity graph: each layer is a graph over a subset of the corpus, with the top layer sparse and each lower layer denser. A query enters at the top, greedily walks toward its nearest neighbor, drops a layer, and repeats. The result is logarithmic search complexity and a state-of-the-art recall/latency curve, at the cost of memory overhead and a non-trivial graph-build phase.

HIERARCHICAL NAVIGABLE SMALL WORLDLong jumps at the top, local refinement at the bottom.LAYER 2sparse · long-rangeLAYER 1middleLAYER 0dense · all vectorsqNODEVISITEDTOP-KQUERY

How the graph is built

Each vector is inserted at a layer drawn from an exponentially decaying distribution: most vectors live only in layer 0 (the dense bottom), a few percent reach layer 1, fewer still reach layer 2, and so on up to a top layer with maybe a dozen nodes. Within each layer, the new node is connected to its M nearest existing neighbors, with edge selection biased toward diverse directions to keep the graph navigable.

HNSW combines two prior ideas. Skip lists give log-time search by keeping a probabilistic hierarchy of pointers — a sparse top level for big jumps, dense lower levels for refinement. Navigable Small World graphs (NSW) are flat proximity graphs that support greedy nearest-neighbor walks with surprisingly few hops, but degrade when the corpus grows because long-range navigation gets stuck. HNSW stacks NSW layers in a skip-list-like hierarchy: top layers handle long-range jumps, bottom layer handles local refinement. The probabilistic layer assignment is what makes the analysis tractable and the structure parameter-free at insertion time.

Build is single-pass, parallelizable across insertions, and incremental — no global retraining when you add new vectors.

How search works

A query starts at a designated entry point in the top layer. At each layer, the algorithm performs a greedy beam search:

  1. Look at the candidate’s neighbors. Compute distance to the query for each.
  2. Move to the closest unvisited neighbor.
  3. Stop when no neighbor is closer than the current candidate.
  4. Drop down a layer, using the current candidate as entry, and repeat.

At layer 0 the algorithm uses a dynamic candidate list of size efSearch, returning the top-k closest from that list. The efSearch knob is the recall/latency dial: bigger efSearch explores more nodes per query, getting higher recall at the cost of more distance computations.

The three knobs

Tuning parameters
  • M — bidirectional edges per node. Controls graph degree and memory. Typical: 16-48. Higher = better recall, more RAM.
  • efConstruction — beam width during graph build. Controls graph quality and build time. Typical: 200-400. Higher = denser, more carefully chosen edges.
  • efSearch — beam width at query time. The recall/latency dial. Typical: 64-512 for first-pass retrieval. Higher = more recall, more latency.

The advice that holds across deployments: start with M=16, efConstruction=200, sweep efSearch from 64 to 512 measuring recall@10, pick the smallest efSearch that hits your recall target. That’s most of HNSW tuning.

Memory and build cost

HNSW’s costs are real. Graph memory is roughly M * 2 * 8 bytes per vector (two pointers per edge in some encodings) plus M * 8 bytes for layer-0 lists, on top of the vectors themselves. A 10-million-vector corpus with d=768, fp32 vectors, M=32 takes ~30GB for vectors and another ~5GB for the graph. Build time is roughly distance computations — minutes to hours for tens of millions of vectors on a single machine.

Where HNSW dominates and where it doesn’t

HNSW is the right default for in-memory retrieval up to ~100M vectors. The recall/latency curve beats every other published algorithm in that range, and the implementations are mature: hnswlib, FAISS-HNSW, pgvector, Qdrant, Weaviate, Vespa, Milvus all ship it.

It struggles at billion-scale, where vector storage alone exceeds reasonable RAM and the graph overhead is multiplicative. There with wins because it doesn’t store a graph and quantizes the vectors. The hybrid pattern — HNSW over PQ-compressed vectors with a full-precision rescore — closes much of that gap and is increasingly common.

Go further

What does the M parameter actually control?

M is the number of bidirectional edges per node in the graph (the graph's max degree). Higher M means denser graph, better recall, more memory. Typical values are 16-48; 16 is the hnswlib default. Memory cost is roughly d×4 bytes (vectors at fp32) + ~8×M bytes per vector for layer-0 graph edges, plus a small higher-layer overhead. At d=768, M=32 that is ~3KB for the vector and ~256B for the graph. Doubling M from 16 to 32 roughly doubles graph memory but adds only a few percent of recall.

Can HNSW handle deletes well?

Not natively. Deletes are typically tombstoned (marked as removed), which leaves dead nodes in the graph that still get traversed. After enough churn, recall degrades and the index needs rebuilding. Some systems (Qdrant, Vespa) implement soft deletes with periodic compaction; pure hnswlib does not.

How does HNSW compare to IVF at billion scale?

HNSW typically beats IVF on recall/latency at fixed RAM, but the graph itself takes 30-50% extra memory on top of the vectors. At billion-scale where the constraint is RAM, IVF-PQ wins because it stores quantized vectors and no graph. HNSW with quantized vectors plus a rescore stage closes the gap.

ZeroEntropy
The best AI teams build with ZeroEntropy models
Follow us on
GitHubTwitterSlackLinkedInDiscord