When Perplexity or ChatGPT read your site, they don't read the whole page from start to finish: they automatically crop out a block of a few hundred words. If the most important information about you is spread across three different sections of the same page, no single block contains it all — and the AI moves on to a source that's easier to use. You're losing citations not because of what you wrote, but because of how it's organized. Restructuring your sections so that each one works on its own is the solution.
Perplexity cites your site. ChatGPT pulls up one of your articles. It seems to be working.
But what did the AI system actually read? Not your page. Not even a full section. It read a fragment — a block of text of 200-500 tokens, cropped algorithmically, compared against thousands of other fragments, and selected because at that moment it was the most relevant for that specific query.
The rest of your page, as far as the model is concerned, doesn’t exist.
I touched on this when introducing RAG and hybrid search — in this article I’ll explain in plain terms how chunking works, why it’s a physical constraint and not a choice, and what it means for anyone who wants to get found in AI answers.
First things first: it’s mechanics, not opinion
Chunking isn’t a stylistic choice by the teams that build RAG systems. It’s a physical constraint of the way language models work.
Models have a limited context window. You can’t feed in 50,000 tokens of web content every time a user asks a question: it would be slow, expensive, and often pointless. The solution is retrieval: pulling out only the relevant parts, keeping them small, and passing them to the model.
This is where the chunk comes from. Not as an abstract concept, but as a concrete operational unit that goes into the prompt.
Your content never ends up in front of the model in its entirety. It ends up inside the prompt as expanded context, one piece at a time — and only the pieces the system judged relevant to that query.
It follows that the quality of your content, as measured by an AI model, is not the average quality of the entire page. It’s the quality of the strongest chunk that page can produce on a specific question.
How chunks are cut
The concrete question is: where does the cut happen?
Minaee et al. (2025), in a study on retrieval architecture, describe the basic structure like this:
“Left: simplified version where a sequence of length n = 12 is split into l = 3 chunks of size m = 4.”
The example uses small numbers for clarity. In reality, a sequence of 1,200 tokens might become 4 chunks of 300 tokens, or 6 chunks of 200 with an overlap of 50. The point is that the cut is systematic and parameterized — not “intelligent” in the sense of understanding what you’re saying.
There are three main strategies that RAG systems use in production:
Fixed-size chunking. Every n tokens — typically 300-500 — a cut is made. With no regard for syntactic or semantic structure. Simple to implement, fast, but brutal: it can split a sentence in half.
Structural chunking. The system uses the document’s natural separation points — section headings, paragraph breaks, specific HTML tags — as cut points. Smarter, but it depends on how clean and consistent your markup is.
Hybrid chunking with overlap. Cuts are made at structural points, but a fixed number of overlap tokens is kept between adjacent chunks to preserve the context at the edges. This mitigates the problem of a cut breaking up a line of reasoning.
Most production systems use a hybrid variant: they cut at section headings, with a maximum token limit per chunk and a configurable overlap. Your headings aren’t just for human readability — they’re cut signals for retrieval engines.
Every section, in isolation, is incomplete — because the author assumed the reader had already read the previous sections.
The metric you didn’t know about: chunk size as a quality signal
There’s an aspect of chunking that’s rarely discussed outside academic literature, and that has direct implications for anyone who wants to get found thanks to their content.
Hu et al. (2024) show it clearly in a study on evaluating retrieval-augmented generation:
“In an extreme case where the generated texts exactly match the reference texts, the number of chunks is only 1.”
What does this mean? When a text generated by a model matches the reference text perfectly, the system doesn’t need to assemble fragments from different sources: it finds everything in a single chunk.
From this follows something interesting: well-structured content — content that answers a specific question completely and self-sufficiently within a single section — tends to produce “stronger” chunks in the retrieval sense of the term. The system doesn’t have to look elsewhere because it already has everything right there.
The number of chunks needed to answer a query is, indirectly, a proxy for the quality of your content for that query. The fewer fragments needed, the more self-sufficient your chunk is.
Don’t build up to the answer — open with the answer.
The structural mistake almost everyone makes
I’ve analyzed the structure of dozens of pages of professional content — guides, pillar articles, landing pages with informative sections. The recurring problem isn’t the quality of the writing. It’s the dependency between sections.
Here’s how it works. You write a complete guide on a complex topic. Section 3 introduces a concept. Section 5 goes deeper. Section 7 applies it with examples. Every section, in isolation, is incomplete — because the author assumed the reader had already read the previous sections.
For a human reader scrolling the page from the top, it works. For a retrieval system that extracts section 5 without reading section 3, it’s a dead end.
Concrete examples of constructions that kill a chunk:
- “As we saw in the previous section…”
- “Picking up the concept introduced above…”
- “For those who already read the guide at the start…”
- “This ties into what was said about [term not redefined]”
Every time you use a construction like this, you’re producing a chunk that needs external context to make sense. The retrieval system doesn’t have that context. Your chunk gets discarded or, worse, generates a partial and inaccurate answer that doesn’t attribute it to you.
A structure that works for retrieval
The operational rule is simple to state, less simple to apply: every section must work as a self-contained mini-article.
This doesn’t mean repeating everything from the start in every section. It means that every section heading poses an implicit question, and that section answers it completely — without deferring to something else, without assuming prior reading, with the key context included.
A section that works as a chunk includes:
- The explicit subject: not “this mechanism”, but “chunking in RAG”. Not “the technique described above”, but the full name of the concept.
- The answer in the first sentence or first paragraph: retrieval pipelines tend to give more weight to the first sentences of a chunk. Don’t build up to the answer — open with the answer.
- Enough context to be understood without the page title: if your chunk is extracted and read on its own, does the reader understand what it’s about? Include the page’s context in the section, not just in the header.
- Your brand or your byline, if relevant: if the AI system extracts only this section, your name or your site must appear there, not just in the footer or the title.
The target size is 200-500 tokens per section. Below 200 tokens, the chunk often doesn’t have enough context to be relevant. Above 500, you risk the system arbitrarily cutting your answer in half.
How to get a sense of it (first step)
You don’t need professional tools to do an initial check. An isolated reading test is enough.
Take an important page on your site. Pick a section in the middle — not the first, not the last. Read it without looking at the page title and without reading the previous sections. Then answer these questions:
- What is this section about? (without looking at the context)
- Does it answer a specific and complete question?
- Does your brand or the name of your site appear anywhere?
- Does it contain at least one sentence that could be quoted verbatim as an answer to a query?
If any of the answers is no, you’ve found a weak chunk. Rewrite it as if it were the first and only paragraph a reader — or an AI system — will ever see.
It’s worth clarifying: this manual test gives you a sense of direction, not a precise measurement. Real retrieval systems use vector scoring, BM25, reranking — variables you can’t replicate by looking at a page in your browser. For a serious analysis of your chunk retrieval profile, you need professional tools and testing across multiple AI engines.
The next step in the pipeline
Chunking is the preparation phase. But after the chunks are retrieved, the pipeline isn’t done. In the next article I’ll talk about reranking — the step where the system decides which chunks really deserve to make it into the prompt. And then in the piece on grounding and citation I’ll close the loop: how the model decides what to cite and who to attribute it to.
Structuring your content for chunk retrieval is work you do once but that shifts the odds persistently. Not for every query — model stochasticity is real and no one can guarantee a specific appearance. But those who produce self-contained, well-structured chunks see the odds grow across all relevant queries, not just some.