Artificial Intelligence 29 min read

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

This article explains how proper document chunking—choosing the right chunk size, overlap, and structure‑aware boundaries—directly impacts the relevance, factuality, and efficiency of Retrieval‑Augmented Generation pipelines, and provides multiple Python implementations ranging from simple fixed‑length splits to semantic and hybrid approaches.

DeWu Technology

Oct 29, 2025

Why Chunking Can Make or Break Your RAG System – Practical Strategies & Code

Background

In Retrieval‑Augmented Generation (RAG) systems, even with powerful LLMs and well‑crafted prompts, missing context, factual errors, or incoherent stitching can occur. The real bottleneck often lies before data is stored: how the documents are chunked. Poor chunking breaks semantic boundaries, mixes noise, and presents fragmented pieces to the model, limiting performance.

What is Chunking?

Chunking is the process of breaking a large text into smaller, manageable segments, which makes embedding and retrieval more efficient and improves relevance.

Why Chunk Content?

Model context window limits : LLMs cannot process arbitrarily long inputs; chunking respects natural boundaries (titles, paragraphs, code blocks) and avoids cutting important information.

Signal‑to‑noise ratio : Too large chunks introduce noise; too small chunks lack sufficient context. Proper chunk size balances recall and precision.

Semantic continuity : Overlap windows preserve cross‑chunk cues, preventing loss of definitions or conditions.

Chunking Strategies

Basic Chunking

Fixed‑length character splitting (e.g., 600 characters with 15% overlap) is simple and fast but often ignores document structure.

from langchain_text_splitters import CharacterTextSplitter
splitter = CharacterTextSplitter(separator="", chunk_size=600, chunk_overlap=90)
chunks = splitter.split_text(text)

Structure‑Aware Chunking

Uses headings, lists, code blocks, tables as natural boundaries, then applies a small overlap.

Sentence‑Level Chunking

First split by Chinese punctuation, then group sentences until a target chunk size is reached.

def split_sentences_zh(text: str):
    pattern = re.compile(r'([^。！？；]*[。！？；]+|[^。！？；]+$)')
    return [m.group(0).strip() for m in pattern.finditer(text) if m.group(0).strip()]

def sentence_chunk(text, chunk_size=600, overlap=90):
    sents = split_sentences_zh(text)
    chunks, buf = [], ""
    for s in sents:
        if len(buf) + len(s) <= chunk_size:
            buf += s
        else:
            chunks.append(buf)
            buf = (buf[-overlap:] if overlap > 0 and len(buf) > overlap else "") + s
    if buf:
        chunks.append(buf)
    return chunks

Recursive Character Chunking

Recursively splits by a hierarchy of separators (titles → paragraphs → lines → spaces → characters) while respecting a maximum chunk size.

from langchain_text_splitters import RecursiveCharacterTextSplitter
separators = [r"
#{1,6}\s", r"
\d+(?:\.\d+)*\s", "

", "
", " ", ""]
splitter = RecursiveCharacterTextSplitter(separators=separators,
                                          chunk_size=700,
                                          chunk_overlap=100,
                                          is_separator_regex=True)
chunks = splitter.split_text(text)

Semantic Chunking

Embeds each sentence, computes novelty scores, and cuts when semantic similarity drops below a dynamic threshold.

def semantic_chunk(text, model_name="BAAI/bge-m3", window_size=2,
                   min_chars=350, max_chars=1100, lambda_std=0.8,
                   overlap_chars=80):
    sents = split_sentences_zh(text)
    model = SentenceTransformer(model_name)
    emb = model.encode(sents, normalize_embeddings=True)
    # compute novelty and split...
    return chunks

Hybrid Chunking

Combines coarse structure‑aware splitting with finer‑grained strategies (sentence, semantic, or recursive) based on chunk length and content type, adding optional small overlaps between adjacent chunks.

Conclusion

Effective chunking balances context completeness and information density. Proper chunk size and overlap, aligned with natural document boundaries, significantly improve retrieval relevance and answer factuality in RAG pipelines.