Category · 8 terms
Vector Databases & Embeddings
defined clearly.
Storing and searching meaning — embedding models, ANN indexes, and the vector DB landscape.
Approximate Nearest Neighbor (ANN)
A family of algorithms that find the approximately closest vectors to a query vector in sublinear time, trading exact accuracy for massive speedups.
Cosine Similarity
A measure of similarity between two vectors equal to the cosine of the angle between them — the default distance metric for text embeddings.
Dimensionality Reduction
Techniques (PCA, t-SNE, UMAP) that compress high-dimensional vectors into 2D or 3D for visualization, or shrink embeddings for storage and speed.
Distance Metric
The function used to measure how far apart two vectors are — choice of metric must match how the embedding model was trained.
Embedding
A dense vector representation of data (text, image, audio) in a learned space where semantic similarity maps to geometric proximity.
HNSW
Hierarchical Navigable Small World — a graph-based ANN algorithm that delivers state-of-the-art recall/speed trade-offs and powers most modern vector databases.
Vector Database
A database optimised for storing and querying high-dimensional vectors via approximate nearest-neighbour search — the storage backend for RAG and semantic search.
Word2Vec
A 2013 neural model from Google that produces dense word embeddings by predicting nearby words, popularizing the embedding revolution in NLP.