Skip to main content
Mainland China
AIMenta

HNSW

Hierarchical Navigable Small World — a graph-based ANN algorithm that delivers state-of-the-art recall/speed trade-offs and powers most modern vector databases.

HNSW (Hierarchical Navigable Small World) is a graph-based approximate-nearest-neighbor algorithm that builds a multi-layer proximity graph over the indexed vectors and navigates that graph to answer queries in logarithmic time relative to the corpus size. The upper layers are sparse long-range shortcuts; the lower layers are dense local neighbourhoods. A query enters at the top, greedily walks toward its neighbours, descends to finer layers, and produces a candidate set whose recall approaches the exact top-k at a small fraction of the compute cost. Published by Malkov and Yashunin in 2016, HNSW became the de facto default for graph-based ANN because of an unusually strong recall-versus-latency profile at the scales most enterprise vector workloads live in.

Three parameters control behaviour. **M** sets the maximum number of graph neighbours per node (typical 16-64; higher improves recall and memory cost). **efConstruction** controls build-time candidate-set size (typical 100-400; higher yields better graph quality at longer index-build time). **efSearch** controls query-time candidate-set size (typical 50-400; higher yields better recall at higher query latency). HNSW ships as the default or option in Qdrant, Weaviate, Milvus, Pinecone (variant), Vespa, Elasticsearch's knn-index, pgvector's hnsw index, and FAISS. It is memory-resident — that is the trade-off — which is why teams at the billion-vector scale switch to IVF-PQ or DiskANN variants.

For APAC mid-market teams deploying vector search, the practical tuning sequence is **start with M=16, efConstruction=200, efSearch=100**, measure recall@k against an exact-search baseline on a labelled sample, then raise efSearch until recall reaches the target (typically 95%+ for RAG use cases). Only then tune for latency — there is almost always efSearch headroom on either side of the current setting. Memory budgeting matters: an HNSW index roughly doubles the memory footprint of the raw vectors, which is manageable at 10M × 768-dim (about 60GB) but becomes real money at 100M+.

The non-obvious failure mode is **cold-start latency on disk-resident HNSW**. Some vector databases memory-map the HNSW graph, which is blazingly fast once in page cache but produces seconds of latency on the first query after a restart or failover. Teams measure steady-state P99 latency, get a great number, and ship to production where a deployment cycle or Kubernetes pod reschedule cycles the page cache and users see multi-second responses for the first minute. Load-testing post-restart, keeping hot replicas warm, or pinning the graph in memory are the operational fixes. HNSW is fast only when its graph is in RAM.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies