🤖 AI/ML Intermediate ⏱️ 14 min

Text Chunking Strategies for RAG

Implement effective text chunking strategies for RAG pipelines with semantic boundaries, overlap, and metadata preservation.

By Victor Robin

Introduction

When building a Retrieval-Augmented Generation (RAG) pipeline, one of the most impactful decisions you’ll make is how to split your documents into chunks. Poor chunking leads to poor retrieval, which leads to poor answers—regardless of how powerful your LLM is.

The core problem: Embedding models have limited context windows (typically 512 tokens), and semantic similarity works best on focused, coherent text segments. If you chunk at arbitrary boundaries, you’ll break sentences mid-thought, separate related concepts, and lose the context needed for accurate retrieval.

What You’ll Learn

In this guide, we’ll implement five different chunking strategies in C#:

  1. Fixed-Size Chunking — Simple but naive approach
  2. Sentence-Based Chunking — Respects sentence boundaries
  3. Paragraph-Based Chunking — Uses natural document structure
  4. Semantic Chunking — Preserves markdown headings and document hierarchy
  5. Sliding Window Chunking — Overlapping chunks for better context preservation

By the end, you’ll have a complete ChunkingService that can intelligently split documents based on their structure and your retrieval requirements.

Architecture Overview

Before diving into strategy details, here’s how chunking fits into the broader RAG pipeline:

flowchart TB
    subgraph Ingestion["📥 Document Ingestion"]
        Doc["📄 Raw Document\n(PDF, Markdown, Text)"]
        OCR["🔍 OCR/Parser"]
        Text["📝 Extracted Text"]
    end

    subgraph Chunking["✂️ Chunking Service"]
        Strategy{"🎯 Strategy\nSelector"}
        Fixed["📏 Fixed-Size\n512 chars"]
        Sentence["💬 Sentence\nNatural breaks"]
        Semantic["🧠 Semantic\nBy headings"]
        Sliding["🔄 Sliding\nWith overlap"]
    end

    subgraph Output["📤 Chunk Output"]
        Chunks["📦 TextChunk[]\nwith metadata"]
        Embed["🧬 Embedding\nService"]
        Store[("🔮 Qdrant")]
    end

    Doc --> OCR --> Text
    Text --> Strategy
    Strategy --> Fixed
    Strategy --> Sentence
    Strategy --> Semantic
    Strategy --> Sliding
    
    Fixed --> Chunks
    Sentence --> Chunks
    Semantic --> Chunks
    Sliding --> Chunks
    
    Chunks --> Embed --> Store

    classDef primary fill:#7c3aed,color:#fff
    classDef secondary fill:#06b6d4,color:#fff
    classDef db fill:#f43f5e,color:#fff
    classDef warning fill:#fbbf24,color:#000

    class Chunking primary
    class Output secondary
    class Ingestion warning

Pipeline Flow:

  1. Ingestion: Documents are parsed/OCR’d into plain text
  2. Strategy Selection: Choose chunking approach based on document type
  3. Chunking: Split text into semantic units with metadata
  4. Embedding: Each chunk is converted to a vector
  5. Storage: Vectors stored in Qdrant for retrieval

Chunking Strategies Overview

Before diving into code, let’s visualize the fundamental differences between chunking approaches. Each strategy makes different tradeoffs between simplicity, consistency, and semantic coherence:

flowchart LR
    Doc["📄 Raw Document"] --> Strategy{"✂️ Strategy?"}
    
    Strategy -->|"Fixed"| Fixed["📏 Fixed-Size\n512 chars each"]
    Strategy -->|"Sentence"| Sentence["💬 Sentence\nNatural breaks"]
    Strategy -->|"Semantic"| Semantic["🧠 Semantic\nBy headings"]
    
    Fixed --> Overlap{"🔄 Overlap?"}
    Sentence --> Overlap
    Semantic --> Overlap
    
    Overlap -->|"Yes (50 chars)"| Chunks1["Chunk 1<br/>(Overlap)<br/>Chunk 2"]
    Overlap -->|"No"| Chunks2["Chunk 1<br/>Chunk 2"]
    
    Chunks1 --> Embed["🧠 Embed\n(Ollama)"]
    Chunks2 --> Embed
    Embed --> Store[("🔮 Qdrant")]

    classDef primary fill:#7c3aed,color:#fff
    classDef secondary fill:#06b6d4,color:#fff
    classDef db fill:#f43f5e,color:#fff
    classDef warning fill:#fbbf24,color:#000

    class Strategy,Embed primary
    class Store db

Fixed-size chunks are predictable but can cut thoughts in half. Sentence-based chunking respects linguistic boundaries but produces variable sizes. Semantic chunking is the gold standard for structured documents—it keeps related content together by following the document’s inherent organization.

The Chunking Service Architecture

Our ChunkingService implements the strategy pattern, allowing callers to choose their chunking approach at runtime. This is useful when different document types benefit from different strategies—for example, structured markdown docs work best with semantic chunking, while plain text might need sentence-based chunking.

// Application/Services/ChunkingService.cs
public sealed class ChunkingService : IChunkingService
{
    private readonly ChunkingOptions _options;
    private readonly ILogger<ChunkingService> _logger;
    
    public ChunkingService(
        IOptions<ChunkingOptions> options,
        ILogger<ChunkingService> logger)
    {
        _options = options.Value;
        _logger = logger;
    }
    
    public IReadOnlyList<TextChunk> ChunkDocument(
        string content,
        ChunkingStrategy strategy = ChunkingStrategy.Semantic)
    {
        return strategy switch
        {
            ChunkingStrategy.FixedSize => ChunkFixedSize(content),
            ChunkingStrategy.Sentence => ChunkBySentence(content),
            ChunkingStrategy.Paragraph => ChunkByParagraph(content),
            ChunkingStrategy.Semantic => ChunkSemantic(content),
            ChunkingStrategy.Sliding => ChunkSliding(content),
            _ => throw new ArgumentOutOfRangeException(nameof(strategy))
        };
    }

The service defaults to Semantic chunking since most of our documents are markdown with headers. The ChunkingOptions (defined later) controls chunk size limits and overlap amounts.

Fixed-Size Chunking

The simplest approach: split text into equal-sized pieces. While naive, this works surprisingly well for homogeneous content where you need predictable chunk sizes.

When to use: Consistent-density text, when chunk size predictability matters, or as a baseline.

Tradeoffs: Will definitely break sentences and paragraphs mid-thought.

    private IReadOnlyList<TextChunk> ChunkFixedSize(string content)
    {
        var chunks = new List<TextChunk>();
        var charIndex = 0;
        var chunkIndex = 0;
        
        while (charIndex < content.Length)
        {
            var chunkLength = Math.Min(_options.MaxChunkSize, content.Length - charIndex);
            var chunkText = content.Substring(charIndex, chunkLength);
            
            chunks.Add(new TextChunk
            {
                Index = chunkIndex++,
                Content = chunkText.Trim(),
                StartPosition = charIndex,
                EndPosition = charIndex + chunkLength,
                TokenEstimate = EstimateTokens(chunkText)
            });
            
            // Move forward, but overlap with previous chunk
            charIndex += chunkLength - _options.Overlap;
        }
        
        return chunks;
    }

Notice we apply overlap even in fixed-size chunking. This creates redundancy but ensures that concepts split at chunk boundaries still appear completely in at least one chunk.

Sentence-Based Chunking

A significant improvement: respect sentence boundaries. This ensures each chunk contains complete thoughts, making embeddings more semantically meaningful.

When to use: Narrative text, articles, or any content where sentence integrity matters.

Tradeoffs: Variable chunk sizes—some may be much smaller or larger than target.

    private IReadOnlyList<TextChunk> ChunkBySentence(string content)
    {
        var sentences = SplitIntoSentences(content);
        var chunks = new List<TextChunk>();
        var currentChunk = new StringBuilder();
        var chunkIndex = 0;
        var startPos = 0;
        var currentPos = 0;
        
        foreach (var sentence in sentences)
        {
            // Check if adding this sentence would exceed our limit
            if (currentChunk.Length + sentence.Length > _options.MaxChunkSize && 
                currentChunk.Length > 0)
            {
                // Finalize current chunk and start a new one
                chunks.Add(new TextChunk
                {
                    Index = chunkIndex++,
                    Content = currentChunk.ToString().Trim(),
                    StartPosition = startPos,
                    EndPosition = currentPos,
                    TokenEstimate = EstimateTokens(currentChunk.ToString())
                });
                
                currentChunk.Clear();
                startPos = currentPos;
            }
            
            currentChunk.Append(sentence).Append(' ');
            currentPos += sentence.Length + 1;
        }
        
        // Don't forget the last chunk!
        if (currentChunk.Length > 0)
        {
            chunks.Add(new TextChunk
            {
                Index = chunkIndex,
                Content = currentChunk.ToString().Trim(),
                StartPosition = startPos,
                EndPosition = currentPos,
                TokenEstimate = EstimateTokens(currentChunk.ToString())
            });
        }
        
        return chunks;
    }

The algorithm greedily accumulates sentences until adding another would exceed the max size, then starts a new chunk. This produces chunks that are at most MaxChunkSize but may be smaller.

Paragraph-Based Chunking

Documents with clear paragraph structure (double newlines) can leverage those natural boundaries. This often aligns with topic shifts better than sentence-level chunking.

When to use: Well-structured documents with clear paragraph breaks.

Tradeoffs: Paragraphs vary widely in size—some may need further splitting.

    private IReadOnlyList<TextChunk> ChunkByParagraph(string content)
    {
        var paragraphs = content.Split(
            new[] { "\n\n", "\r\n\r\n" }, 
            StringSplitOptions.RemoveEmptyEntries);
        
        var chunks = new List<TextChunk>();
        var currentChunk = new StringBuilder();
        var chunkIndex = 0;
        var position = 0;
        var startPos = 0;
        
        foreach (var paragraph in paragraphs)
        {
            var trimmed = paragraph.Trim();
            if (string.IsNullOrEmpty(trimmed)) continue;
            
            // Handle oversized paragraphs by falling back to sentence chunking
            if (trimmed.Length > _options.MaxChunkSize)
            {
                // Flush current buffer first
                if (currentChunk.Length > 0)
                {
                    chunks.Add(CreateChunk(currentChunk.ToString(), chunkIndex++, startPos, position));
                    currentChunk.Clear();
                    startPos = position;
                }
                
                // Split the large paragraph using sentence chunking
                var subChunks = ChunkBySentence(trimmed);
                foreach (var sub in subChunks)
                {
                    sub.Index = chunkIndex++;
                    sub.StartPosition += position;
                    sub.EndPosition += position;
                    chunks.Add(sub);
                }
                
                position += trimmed.Length + 2;
                startPos = position;
                continue;
            }
            
            // Normal case: accumulate paragraphs
            if (currentChunk.Length + trimmed.Length + 2 > _options.MaxChunkSize && 
                currentChunk.Length > 0)
            {
                chunks.Add(CreateChunk(currentChunk.ToString(), chunkIndex++, startPos, position));
                currentChunk.Clear();
                startPos = position;
            }
            
            currentChunk.AppendLine(trimmed);
            currentChunk.AppendLine();
            position += trimmed.Length + 2;
        }
        
        if (currentChunk.Length > 0)
        {
            chunks.Add(CreateChunk(currentChunk.ToString(), chunkIndex, startPos, position));
        }
        
        return chunks;
    }

Notice the fallback to sentence chunking for oversized paragraphs. This hierarchical approach ensures we never produce chunks larger than the configured maximum.

This is our most sophisticated strategy. It parses markdown structure and uses headers to create semantically coherent chunks. Each chunk retains its heading context, dramatically improving retrieval relevance.

When to use: Markdown documents, technical documentation, any content with headers.

Tradeoffs: More complex, assumes markdown structure.

    private IReadOnlyList<TextChunk> ChunkSemantic(string content)
    {
        // Parse the document's markdown structure
        var sections = ParseMarkdownSections(content);
        
        if (sections.Count == 0)
        {
            // No headers found - fall back to paragraph chunking
            return ChunkByParagraph(content);
        }
        
        var chunks = new List<TextChunk>();
        var chunkIndex = 0;
        
        foreach (var section in sections)
        {
            var sectionContent = section.Content;
            
            // Small sections become single chunks
            if (sectionContent.Length <= _options.MaxChunkSize)
            {
                chunks.Add(new TextChunk
                {
                    Index = chunkIndex++,
                    Content = sectionContent.Trim(),
                    StartPosition = section.StartPosition,
                    EndPosition = section.EndPosition,
                    TokenEstimate = EstimateTokens(sectionContent),
                    Metadata = new ChunkMetadata
                    {
                        Heading = section.Heading,
                        HeadingLevel = section.Level
                    }
                });
            }
            else
            {
                // Large sections: split by paragraph but preserve heading context
                var subChunks = ChunkByParagraph(sectionContent);
                foreach (var sub in subChunks)
                {
                    // Prepend the section heading to each sub-chunk
                    // This ensures retrieval knows the context
                    var contextualContent = section.Heading != null
                        ? $"## {section.Heading}\n\n{sub.Content}"
                        : sub.Content;
                    
                    chunks.Add(new TextChunk
                    {
                        Index = chunkIndex++,
                        Content = contextualContent,
                        StartPosition = section.StartPosition + sub.StartPosition,
                        EndPosition = section.StartPosition + sub.EndPosition,
                        TokenEstimate = EstimateTokens(contextualContent),
                        Metadata = new ChunkMetadata
                        {
                            Heading = section.Heading,
                            HeadingLevel = section.Level
                        }
                    });
                }
            }
        }
        
        return chunks;
    }

The key insight: when a section must be split, we prepend the heading to each sub-chunk. This way, when a user asks “How do I configure authentication?”, chunks from the “Authentication” section will have higher similarity even if the word “authentication” doesn’t appear in that specific paragraph.

Sliding Window Chunking

For dense technical content where context spans multiple paragraphs, overlapping chunks ensure no information falls through the cracks.

When to use: Technical documentation, code explanations, any content where adjacent sections are highly related.

Tradeoffs: Produces more chunks (more storage), but better recall.

    private IReadOnlyList<TextChunk> ChunkSliding(string content)
    {
        var chunks = new List<TextChunk>();
        var sentences = SplitIntoSentences(content);
        var windowStart = 0;
        var chunkIndex = 0;
        var sentencePositions = CalculateSentencePositions(sentences);
        
        while (windowStart < sentences.Count)
        {
            var currentChunk = new StringBuilder();
            var windowEnd = windowStart;
            
            // Accumulate sentences until max size
            while (windowEnd < sentences.Count && 
                   currentChunk.Length + sentences[windowEnd].Length < _options.MaxChunkSize)
            {
                currentChunk.Append(sentences[windowEnd]).Append(' ');
                windowEnd++;
            }
            
            if (currentChunk.Length > 0)
            {
                chunks.Add(new TextChunk
                {
                    Index = chunkIndex++,
                    Content = currentChunk.ToString().Trim(),
                    StartPosition = sentencePositions[windowStart],
                    EndPosition = sentencePositions[Math.Min(windowEnd, sentences.Count - 1)],
                    TokenEstimate = EstimateTokens(currentChunk.ToString())
                });
            }
            
            // Slide window - keep some overlap
            var sentencesToSkip = Math.Max(1, 
                (windowEnd - windowStart) * (1 - _options.Overlap / _options.MaxChunkSize));
            windowStart += (int)sentencesToSkip;
        }
        
        return chunks;
    }

The overlap is controlled by _options.Overlap. A 20% overlap means each chunk shares about 20% of its content with the previous chunk, creating redundancy that improves recall at the cost of storage.

Helper Methods

These utilities support the chunking strategies:

    private static IReadOnlyList<string> SplitIntoSentences(string text)
    {
        // Pattern: split after sentence-ending punctuation followed by whitespace and capital letter
        var pattern = @"(?<=[.!?])\s+(?=[A-Z])";
        return Regex.Split(text, pattern)
            .Where(s => !string.IsNullOrWhiteSpace(s))
            .ToList();
    }
    
    private static IReadOnlyList<MarkdownSection> ParseMarkdownSections(string content)
    {
        var sections = new List<MarkdownSection>();
        var headingPattern = @"^(#{1,6})\s+(.+)$";
        var lines = content.Split('\n');
        
        MarkdownSection? current = null;
        var contentBuilder = new StringBuilder();
        var position = 0;
        var startPosition = 0;
        
        foreach (var line in lines)
        {
            var match = Regex.Match(line, headingPattern);
            
            if (match.Success)
            {
                // Save previous section
                if (current != null)
                {
                    current.Content = contentBuilder.ToString();
                    current.EndPosition = position;
                    sections.Add(current);
                }
                
                // Start new section
                contentBuilder.Clear();
                startPosition = position;
                current = new MarkdownSection
                {
                    Heading = match.Groups[2].Value,
                    Level = match.Groups[1].Value.Length,
                    StartPosition = startPosition
                };
            }
            
            contentBuilder.AppendLine(line);
            position += line.Length + 1;
        }
        
        // Don't forget the final section
        if (current != null)
        {
            current.Content = contentBuilder.ToString();
            current.EndPosition = position;
            sections.Add(current);
        }
        else if (contentBuilder.Length > 0)
        {
            // No headings found - treat entire content as one section
            sections.Add(new MarkdownSection
            {
                Content = contentBuilder.ToString(),
                StartPosition = 0,
                EndPosition = position
            });
        }
        
        return sections;
    }
    
    // Quick token estimation: ~4 characters per token on average
    private static int EstimateTokens(string text) => text.Length / 4;
    
    private static TextChunk CreateChunk(string content, int index, int start, int end) =>
        new()
        {
            Index = index,
            Content = content.Trim(),
            StartPosition = start,
            EndPosition = end,
            TokenEstimate = EstimateTokens(content)
        };
    
    private static int[] CalculateSentencePositions(IReadOnlyList<string> sentences)
    {
        var positions = new int[sentences.Count];
        var pos = 0;
        for (int i = 0; i < sentences.Count; i++)
        {
            positions[i] = pos;
            pos += sentences[i].Length + 1;
        }
        return positions;
    }
}

Data Types

Finally, here are the supporting types used throughout the service:

public sealed record TextChunk
{
    public int Index { get; set; }
    public required string Content { get; init; }
    public int StartPosition { get; set; }
    public int EndPosition { get; set; }
    public int TokenEstimate { get; init; }
    public ChunkMetadata? Metadata { get; init; }
}

public sealed record ChunkMetadata
{
    public string? Heading { get; init; }
    public int HeadingLevel { get; init; }
}

public sealed record ChunkingOptions
{
    public int MaxChunkSize { get; init; } = 512;  // Characters, not tokens
    public int Overlap { get; init; } = 50;        // Overlap in characters
    public int MinChunkSize { get; init; } = 100;  // Avoid tiny chunks
}

public enum ChunkingStrategy
{
    FixedSize,
    Sentence,
    Paragraph,
    Semantic,
    Sliding
}

Choosing the Right Strategy

StrategyBest ForChunk SizeConsiderations
Fixed-SizeSimple text, consistent densityPredictableMay break mid-thought
SentenceNarrative text, articlesVariableRespects linguistic units
ParagraphWell-structured documentsVery variableNeeds good paragraph structure
SemanticMarkdown/technical docsVariableBest quality for structured content
SlidingDense technical contentOverlappingMore storage, better recall

Conclusion

Effective chunking is the foundation of a performant RAG system. The key principles:

  • Semantic coherence: Keep related content together using document structure
  • Size consistency: Fit within embedding model context windows (typically 512 tokens)
  • Context preservation: Include headers and use overlap to prevent information loss
  • Retrieval quality: Smaller, focused chunks produce more precise similarity matches

Start with semantic chunking for structured documents, then tune based on your retrieval metrics. Monitor which chunks get retrieved for your test queries—if they’re missing critical context, increase overlap or adjust your chunking strategy.

The ChunkingService we built supports all five strategies through a simple interface, making it easy to experiment and find what works best for your document corpus.

[Chunking Strategies for LLM Applications] — Pinecone