Semantic Chunking
Splitting documents by meaning rather than character count — dividing the Fellowship by role, not by height.
Semantic chunking is the practice of dividing documents into retrievable units based on their meaning and structure rather than arbitrary size boundaries. The most naive chunking approach splits a document every N characters or tokens, regardless of whether the split falls in the middle of a sentence, concept, or argument. This mechanical splitting creates chunks that frequently begin or end mid-thought, making them harder for retrieval systems to accurately match to queries (a chunk that starts in the middle of an explanation won't embed well for queries about that topic's beginning) and harder for LLMs to use correctly (context without its surrounding explanation is less informative). Semantic chunking instead identifies natural breakpoints — paragraph boundaries, section headings, transition sentences, or content-based topic shifts detected by an embedding model — and splits at those points, preserving the integrity of each unit of meaning.
Several semantic chunking strategies exist with different tradeoffs. Paragraph-based chunking splits at paragraph boundaries — simple, fast, and effective for well-structured documents, but produces highly variable chunk sizes. Sentence-based chunking with grouping combines adjacent sentences until a target token count is reached, then splits at sentence boundaries — more consistent sizes while preserving sentence integrity. Embedding-based semantic chunking measures cosine similarity between adjacent sentences in embedding space and splits where similarity drops significantly — detecting topic transitions as semantic distance rather than formatting markers. Hierarchical chunking maintains document structure explicitly, embedding both individual chunks and their parent sections so retrieval can match at the appropriate granularity depending on the query.
For B2B teams building RAG applications over corporate documentation, technical content, and knowledge bases, semantic chunking is a meaningful quality improvement over naive character-count splitting. The improvement shows up in retrieval recall — the percentage of relevant information that is successfully retrieved for any given query — because chunks that preserve complete concepts embed more accurately and match queries more reliably than chunks that arbitrarily cut thoughts in half. The practical guidance: start with paragraph-based chunking as the baseline (simple to implement, much better than fixed-character splitting), measure retrieval quality on representative test queries, and invest in more sophisticated semantic chunking approaches if retrieval quality remains the bottleneck in RAG pipeline performance.
Related terms
- Retrieval-Augmented Generation (RAG)— Giving your LLM access to the Restricted Section — so it answers from real knowledge instead of confident hallucination.
- Embedding— Numeric representation of meaning — Elvish rune encoding, but as floating-point vectors optimized for semantic search.
- Vector Database— The database that stores meaning as numbers — the Sorting Hat's filing system, indexed for semantic retrieval.
- Semantic Search— Finding results by meaning rather than keyword match — 'Accio relevant context' without knowing the exact incantation.
- Context Window— How much the AI holds in working memory — the Pensieve has infinite capacity; LLMs are still catching up.