- Published on
RAG Chunking Strategies — Why Your Chunk Size Is Killing Retrieval Quality
- Authors

- Name
- Sanjeev Sharma
- @webcoderspeed1
Introduction
Chunk size and strategy directly determine RAG quality. Fixed-size chunks destroy context boundaries, semantic chunking misses relationships, and naive splitting creates orphaned fragments. This guide covers production chunking techniques that maximize recall while minimizing latency.
- Fixed-Size vs Semantic Chunking
- Recursive Text Splitter With Overlap
- Sentence-Based and Paragraph-Based Chunking
- Document-Specific Strategies
- Chunk Metadata for Filtering
- Parent-Child Chunking
- Late Chunking With Long-Context Embeddings
- Evaluation With Recall@K and MRR
- Checklist
- Conclusion
Fixed-Size vs Semantic Chunking
Semantic chunking preserves meaning boundaries while fixed-size chunking trades quality for simplicity.
class ChunkingStrategy {
// Fixed-size chunking - simple but often breaks meaning
fixedSizeChunk(text: string, chunkSize: number = 512, overlap: number = 64): string[] {
const chunks: string[] = [];
for (let i = 0; i < text.length; i += chunkSize - overlap) {
chunks.push(text.slice(i, i + chunkSize));
}
return chunks;
}
// Semantic chunking - preserves meaning boundaries
async semanticChunk(text: string): Promise<string[]> {
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
const chunks: string[] = [];
let currentChunk = '';
const targetSize = 512;
for (const sentence of sentences) {
if ((currentChunk + sentence).length > targetSize) {
if (currentChunk) chunks.push(currentChunk);
currentChunk = sentence;
} else {
currentChunk += sentence;
}
}
if (currentChunk) chunks.push(currentChunk);
return chunks;
}
// Evaluate chunking quality
evaluateChunks(chunks: string[]): {
avgSize: number;
minSize: number;
maxSize: number;
incompleteCount: number;
} {
const sizes = chunks.map((c) => c.length);
const avgSize = sizes.reduce((a, b) => a + b, 0) / sizes.length;
const incomplete = chunks.filter((c) => !c.trim().endsWith('.') && !c.trim().endsWith('?')).length;
return {
avgSize: Math.round(avgSize),
minSize: Math.min(...sizes),
maxSize: Math.max(...sizes),
incompleteCount: incomplete,
};
}
}
const chunker = new ChunkingStrategy();
const text = 'First sentence. Second sentence. Third sentence. Fourth sentence.';
const fixedChunks = chunker.fixedSizeChunk(text, 30);
const semanticChunks = await chunker.semanticChunk(text);
console.log('Fixed chunks quality:', chunker.evaluateChunks(fixedChunks));
console.log('Semantic chunks quality:', chunker.evaluateChunks(semanticChunks));
Recursive Text Splitter With Overlap
Recursively split by increasingly specific delimiters while maintaining context overlap.
class RecursiveTextSplitter {
private separators = ['\n\n', '\n', '. ', ' ', ''];
private chunkSize = 1024;
private overlapSize = 128;
split(text: string): string[] {
return this.recursiveSplit(text, this.separators);
}
private recursiveSplit(text: string, separators: string[]): string[] {
const chunks: string[] = [];
let separator = separators[separators.length - 1];
for (let i = 0; i < separators.length; i++) {
const s = separators[i];
if (s === '') break;
const parts = text.split(s);
const goodChunks = parts.filter((p) => p.length >= 1);
if (goodChunks.length > 1) {
separator = s;
break;
}
}
const splits = text.split(separator);
return this.mergeSplits(splits, separator);
}
private mergeSplits(splits: string[], separator: string): string[] {
const chunks: string[] = [];
let currentChunk = '';
for (const split of splits) {
const combined = currentChunk + separator + split;
if (combined.length < this.chunkSize) {
currentChunk = combined.trim();
} else {
if (currentChunk) chunks.push(currentChunk);
currentChunk = split;
}
}
if (currentChunk) chunks.push(currentChunk);
// Add overlap for context
const chunksWithOverlap: string[] = [];
for (let i = 0; i < chunks.length; i++) {
let chunk = chunks[i];
if (i > 0) {
const prevChunk = chunks[i - 1];
const overlap = prevChunk.slice(-this.overlapSize);
chunk = overlap + '\n' + chunk;
}
chunksWithOverlap.push(chunk);
}
return chunksWithOverlap;
}
}
const splitter = new RecursiveTextSplitter();
const document = 'Long document content...\n\nWith multiple paragraphs. And sentences.';
const chunks = splitter.split(document);
Sentence-Based and Paragraph-Based Chunking
Align chunks to natural sentence and paragraph boundaries.
class SentenceChunker {
// Split by sentences while maintaining size limits
chunkBySentences(text: string, maxSize: number = 1024): string[] {
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
const chunks: string[] = [];
let currentChunk = '';
for (const sentence of sentences) {
if ((currentChunk + ' ' + sentence).length <= maxSize) {
currentChunk += (currentChunk ? ' ' : '') + sentence;
} else {
if (currentChunk) chunks.push(currentChunk);
currentChunk = sentence;
}
}
if (currentChunk) chunks.push(currentChunk);
return chunks;
}
// Split by paragraphs, then by sentences if too large
chunkByParagraphs(text: string, maxSize: number = 1024): string[] {
const paragraphs = text.split(/\n\n+/);
const chunks: string[] = [];
for (const para of paragraphs) {
if (para.length <= maxSize) {
chunks.push(para);
} else {
// If paragraph is too large, split by sentences
const sentences = para.match(/[^.!?]+[.!?]+/g) || [];
let currentChunk = '';
for (const sentence of sentences) {
if ((currentChunk + ' ' + sentence).length <= maxSize) {
currentChunk += (currentChunk ? ' ' : '') + sentence;
} else {
if (currentChunk) chunks.push(currentChunk);
currentChunk = sentence;
}
}
if (currentChunk) chunks.push(currentChunk);
}
}
return chunks;
}
}
const sentenceChunker = new SentenceChunker();
const multiParagraphText = 'Paragraph 1. Sentence 1. Sentence 2.\n\nParagraph 2. Sentence 3.';
const chunks = sentenceChunker.chunkByParagraphs(multiParagraphText);
Document-Specific Strategies
Different document types benefit from different chunking approaches.
abstract class DocumentChunker {
abstract chunk(content: string): Array<{ text: string; metadata: Record<string, unknown> }>;
}
class CodeChunker extends DocumentChunker {
chunk(content: string): Array<{ text: string; metadata: Record<string, unknown> }> {
const chunks: Array<{ text: string; metadata: Record<string, unknown> }> = [];
const lines = content.split('\n');
let currentChunk = '';
let currentFunction = '';
for (const line of lines) {
// Detect function/class definitions
if (line.match(/^(function|class|const.*=.*=>|\w+\s*\()/)) {
if (currentChunk) {
chunks.push({
text: currentChunk.trim(),
metadata: { type: 'code', function: currentFunction },
});
}
currentFunction = line.match(/\w+/)?.[0] || 'unknown';
currentChunk = line;
} else {
currentChunk += '\n' + line;
if (currentChunk.length > 1024) {
chunks.push({
text: currentChunk.trim(),
metadata: { type: 'code', function: currentFunction },
});
currentChunk = '';
}
}
}
if (currentChunk.trim()) {
chunks.push({
text: currentChunk.trim(),
metadata: { type: 'code', function: currentFunction },
});
}
return chunks;
}
}
class MarkdownChunker extends DocumentChunker {
chunk(content: string): Array<{ text: string; metadata: Record<string, unknown> }> {
const chunks: Array<{ text: string; metadata: Record<string, unknown> }> = [];
const sections = content.split(/^##\s+/m);
for (const section of sections) {
const title = section.split('\n')[0];
const body = section.split('\n').slice(1).join('\n');
chunks.push({
text: body.trim(),
metadata: { type: 'markdown', section: title },
});
}
return chunks;
}
}
class PDFChunker extends DocumentChunker {
chunk(content: string): Array<{ text: string; metadata: Record<string, unknown> }> {
// Split by page markers or use page-based chunking
const pages = content.split(/\n\[PAGE \d+\]\n/);
const chunks: Array<{ text: string; metadata: Record<string, unknown> }> = [];
pages.forEach((page, idx) => {
chunks.push({
text: page.trim(),
metadata: { type: 'pdf', page: idx + 1 },
});
});
return chunks;
}
}
const codeChunker = new CodeChunker();
const markdownChunker = new MarkdownChunker();
const pdfChunker = new PDFChunker();
const codeChunks = codeChunker.chunk('function hello() { return "world"; }');
const mdChunks = markdownChunker.chunk('## Introduction\nContent here.');
Chunk Metadata for Filtering
Add rich metadata to enable pre-filtering before vector search.
interface ChunkMetadata {
source: string;
section?: string;
timestamp?: Date;
author?: string;
confidenceLevel?: 'high' | 'medium' | 'low';
tags?: string[];
}
class ChunkWithMetadata {
constructor(
public id: string,
public text: string,
public metadata: ChunkMetadata,
public embedding?: number[]
) {}
static create(text: string, metadata: ChunkMetadata): ChunkWithMetadata {
return new ChunkWithMetadata(`chunk_${Date.now()}_${Math.random()}`, text, metadata);
}
}
class MetadataIndexer {
private chunks: Map<string, ChunkWithMetadata> = new Map();
addChunk(chunk: ChunkWithMetadata): void {
this.chunks.set(chunk.id, chunk);
}
filterByMetadata(filters: Partial<ChunkMetadata>): ChunkWithMetadata[] {
const results: ChunkWithMetadata[] = [];
for (const chunk of this.chunks.values()) {
let matches = true;
if (filters.source && chunk.metadata.source !== filters.source) matches = false;
if (filters.section && chunk.metadata.section !== filters.section) matches = false;
if (filters.confidenceLevel && chunk.metadata.confidenceLevel !== filters.confidenceLevel) {
matches = false;
}
if (filters.tags && !filters.tags.every((tag) => chunk.metadata.tags?.includes(tag))) {
matches = false;
}
if (matches) results.push(chunk);
}
return results;
}
getChunksBySource(source: string): ChunkWithMetadata[] {
return Array.from(this.chunks.values()).filter((c) => c.metadata.source === source);
}
getChunksByConfidence(level: 'high' | 'medium' | 'low'): ChunkWithMetadata[] {
return Array.from(this.chunks.values()).filter((c) => c.metadata.confidenceLevel === level);
}
}
const indexer = new MetadataIndexer();
const chunk = ChunkWithMetadata.create('Important information', {
source: 'annual_report.pdf',
section: 'Financial',
confidenceLevel: 'high',
tags: ['finance', 'revenue'],
});
indexer.addChunk(chunk);
const filtered = indexer.filterByMetadata({ confidenceLevel: 'high', source: 'annual_report.pdf' });
Parent-Child Chunking
Create small chunks for retrieval but maintain parent context for generation.
class ParentChildChunker {
chunkWithParent(
text: string,
smallChunkSize: number = 256,
parentChunkSize: number = 1024
): Array<{ id: string; text: string; parentId: string; parentText: string }> {
const parentChunks = this.createChunks(text, parentChunkSize);
const results: Array<{ id: string; text: string; parentId: string; parentText: string }> = [];
for (const parentChunk of parentChunks) {
const parentId = `parent_${Date.now()}_${Math.random()}`;
const children = this.createChunks(parentChunk, smallChunkSize);
for (const child of children) {
results.push({
id: `child_${Date.now()}_${Math.random()}`,
text: child,
parentId,
parentText: parentChunk,
});
}
}
return results;
}
private createChunks(text: string, size: number): string[] {
const chunks: string[] = [];
let current = '';
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
for (const sentence of sentences) {
if ((current + sentence).length > size) {
if (current) chunks.push(current);
current = sentence;
} else {
current += sentence;
}
}
if (current) chunks.push(current);
return chunks;
}
}
const parentChildChunker = new ParentChildChunker();
const chunks = parentChildChunker.chunkWithParent(
'Document with multiple sentences. Each sentence is important. They form paragraphs together.',
128,
512
);
chunks.forEach((chunk) => {
console.log(`Child: ${chunk.text.slice(0, 50)}...`);
console.log(`Parent: ${chunk.parentText.slice(0, 100)}...`);
});
Late Chunking With Long-Context Embeddings
Use long-context models to embed full documents, then retrieve later.
class LateChunker {
async embedFullDocument(text: string): Promise<{ embedding: number[]; chunks: string[] }> {
// With long-context embeddings, we can embed the full document
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'text-embedding-3-large',
input: text,
}),
});
const data = (await response.json()) as { data: Array<{ embedding: number[] }> };
const embedding = data.data[0].embedding;
// Only chunk at query time for better context
const chunks = this.chunkAtQueryTime(text, 256);
return { embedding, chunks };
}
private chunkAtQueryTime(text: string, chunkSize: number): string[] {
const sentences = text.match(/[^.!?]+[.!?]+/g) || [];
const chunks: string[] = [];
let current = '';
for (const sentence of sentences) {
if ((current + sentence).length > chunkSize) {
if (current) chunks.push(current);
current = sentence;
} else {
current += sentence;
}
}
if (current) chunks.push(current);
return chunks;
}
async retrieveWithLateChunking(
query: string,
documentEmbedding: number[],
chunks: string[]
): Promise<string[]> {
// Retrieve with full document context, then re-rank chunks
const queryEmbedding = await this.getQueryEmbedding(query);
const scored = chunks.map((chunk) => ({
chunk,
score: this.cosineSimilarity(queryEmbedding, documentEmbedding),
}));
return scored.sort((a, b) => b.score - a.score).slice(0, 5).map((item) => item.chunk);
}
private async getQueryEmbedding(query: string): Promise<number[]> {
const response = await fetch('https://api.openai.com/v1/embeddings', {
method: 'POST',
headers: {
'Content-Type': 'application/json',
Authorization: `Bearer ${process.env.OPENAI_API_KEY}`,
},
body: JSON.stringify({
model: 'text-embedding-3-small',
input: query,
}),
});
const data = (await response.json()) as { data: Array<{ embedding: number[] }> };
return data.data[0].embedding;
}
private cosineSimilarity(vecA: number[], vecB: number[]): number {
const dotProduct = vecA.reduce((sum, a, i) => sum + a * vecB[i], 0);
const magnitudeA = Math.sqrt(vecA.reduce((sum, a) => sum + a * a, 0));
const magnitudeB = Math.sqrt(vecB.reduce((sum, b) => sum + b * b, 0));
return dotProduct / (magnitudeA * magnitudeB);
}
}
const lateChunker = new LateChunker();
const fullDoc = 'Long document content that we want to embed in full context...';
const { embedding, chunks } = await lateChunker.embedFullDocument(fullDoc);
Evaluation With Recall@K and MRR
Measure chunking effectiveness with retrieval metrics.
class ChunkingEvaluator {
recallAtK(retrieved: string[], relevant: string[], k: number): number {
const topK = retrieved.slice(0, k);
const matches = topK.filter((item) => relevant.includes(item)).length;
return matches / Math.min(k, relevant.length);
}
meanReciprocalRank(retrieved: string[], relevant: string[]): number {
for (let i = 0; i < retrieved.length; i++) {
if (relevant.includes(retrieved[i])) {
return 1 / (i + 1);
}
}
return 0;
}
ndcg(retrieved: string[], relevant: string[], k: number): number {
const topK = retrieved.slice(0, k);
let dcg = 0;
for (let i = 0; i < topK.length; i++) {
const relevance = relevant.includes(topK[i]) ? 1 : 0;
dcg += relevance / Math.log2(i + 2);
}
let idcg = 0;
for (let i = 0; i < Math.min(relevant.length, k); i++) {
idcg += 1 / Math.log2(i + 2);
}
return dcg / idcg;
}
evaluateChunkingStrategy(
testCases: Array<{ query: string; relevantChunks: string[] }>,
chunkingFn: (text: string) => string[]
): { avgRecall5: number; avgMRR: number; avgNDCG: number } {
let totalRecall5 = 0;
let totalMRR = 0;
let totalNDCG = 0;
for (const testCase of testCases) {
const chunks = chunkingFn(testCase.query);
totalRecall5 += this.recallAtK(chunks, testCase.relevantChunks, 5);
totalMRR += this.meanReciprocalRank(chunks, testCase.relevantChunks);
totalNDCG += this.ndcg(chunks, testCase.relevantChunks, 5);
}
const count = testCases.length;
return {
avgRecall5: totalRecall5 / count,
avgMRR: totalMRR / count,
avgNDCG: totalNDCG / count,
};
}
}
const evaluator = new ChunkingEvaluator();
const metrics = evaluator.evaluateChunkingStrategy(
[
{ query: 'topic', relevantChunks: ['chunk1', 'chunk2'] },
{ query: 'other', relevantChunks: ['chunk3'] },
],
(text) => text.split(' ')
);
console.log('Chunking metrics:', metrics);
Checklist
- Profile your document types and choose strategies per type
- Use sentence-boundary alignment for natural language documents
- Use function/class boundaries for code documents
- Implement parent-child chunking for better context
- Add rich metadata for pre-filtering and post-ranking
- Measure retrieval quality with recall@k and MRR
- Test semantic chunking vs fixed-size for your domain
- Use long-context embeddings to embed full documents when possible
- Validate chunking on golden datasets quarterly
- Monitor average chunk size distribution for consistency
Conclusion
Chunk strategy directly determines RAG quality. Start with semantic chunking based on natural boundaries (sentences, paragraphs, functions). Add parent-child structure for context, rich metadata for filtering, and measure quality with recall@k metrics. As you scale, experiment with late chunking using long-context embeddings for maximum performance.