Category · 7 terms
RAG & Retrieval
defined clearly.
Grounding LLMs in your knowledge base — chunking, retrieval, re-ranking, citations.
BM25
A bag-of-words ranking function that scores documents by term frequency and inverse document frequency, with length normalization. The classical workhorse of keyword search.
Chunking
The process of splitting source documents into smaller passages for retrieval — the most underrated determinant of RAG quality.
Context Stuffing
The anti-pattern of cramming as much retrieved content as possible into an LLM prompt, betting that more context yields better answers.
Hybrid Search
A retrieval strategy that combines lexical (BM25) and semantic (vector) search, fusing scores to capture both keyword precision and conceptual recall.
Information Retrieval
The discipline of finding relevant items (documents, passages, records) from a collection given a query — the foundation of search and RAG systems.
Reranking
A second-stage retrieval step that re-scores a candidate set with a more expensive but more accurate model — typically a cross-encoder.
Retrieval-Augmented Generation (RAG)
A pattern that grounds LLM responses in retrieved documents — the standard approach for building fact-anchored AI over proprietary knowledge bases.