Skip to main content
Mainland China
AIMenta
intermediate · RAG & Retrieval

Chunking

The process of splitting source documents into smaller passages for retrieval — the most underrated determinant of RAG quality.

Chunking is the process of splitting source documents into smaller passages that become the retrievable units in a RAG system. Every other part of the RAG stack operates on chunks rather than full documents: the embedder embeds them, the index stores them, retrieval returns them, the LLM reasons over them. Chunking is consequently the most underrated determinant of RAG quality — a well-tuned chunking strategy can lift retrieval recall by 20-40 percentage points over a default, and a badly-tuned one can defeat everything downstream regardless of how strong the embedder or reranker is.

The strategies fall into five families. **Fixed-token chunking** splits at N tokens with M-token overlap — simple, fast, semantically naive, often the right starting default. **Recursive character splitting** (the LangChain default) splits at paragraph, then sentence, then character boundaries — respects structure cheaply. **Sentence-level chunking** treats each sentence as a unit, then batches adjacent sentences into larger retrievable groups. **Semantic chunking** (2024+ technique) embeds candidate split points and cuts where semantic distance spikes — computationally more expensive but respects topic boundaries. **Structure-aware chunking** uses document structure (markdown headings, HTML sections, PDF layout, table rows, code blocks) as boundaries — essential for technical corpora where tables, code, and sections carry distinct semantics.

For APAC mid-market teams, the practical default is **512-token fixed chunks with 50-token overlap as a starting baseline, then evaluate and tune per corpus**. For technical documentation, switch to structure-aware chunking that preserves table rows and code blocks as intact units. For legal or policy corpora, preserve clause boundaries. For multilingual or CJK corpora, token counts behave differently — 512 tokens is a shorter passage in Japanese or Chinese than in English, so raise the token budget proportionally. Evaluate chunking quality by measuring recall@k on a labelled query set, not by inspection.

The non-obvious failure mode is **chunking across semantic boundaries**. A fixed-length splitter cuts through the middle of a table row and the retriever returns half the row with no header; it splits a code function mid-body; it separates a legal clause from its preamble. The symptom is retrieval that returns technically-relevant chunks but semantically broken context, and the model then hallucinates around the gap or refuses. The fix is either structure-aware chunking upstream or post-retrieval expansion (when a chunk is retrieved, include its neighbouring chunks from the same document). The cost of the mitigation is trivial relative to the quality gain.

Where AIMenta applies this

Service lines where this concept becomes a deliverable for clients.

Beyond this term

Where this concept ships in practice.

Encyclopedia entries name the moving parts. The links below show where AIMenta turns these concepts into engagements — across service pillars, industry verticals, and Asian markets.

Continue with All terms · AI tools · Insights · Case studies