How AI engines think

For AI, your page structure matters more than length

Roberto Serra 25 June 2026·~7 min read

You have a long, in-depth, well-written page — yet AI almost never cites it, while competitors with shorter, less rich content get mentioned constantly. The problem isn't what you wrote, but how it's organized. AI engines don't read from start to finish: they evaluate block by block, and if every piece depends on the previous one to be understood, the whole content becomes unusable. Restructuring sections so each one works on its own can turn that page into a source AI chooses.

“Write longer content to rank better.” For years that was the SEO mantra. And on Google, longer, more complete pages have indeed often had an advantage. But AI doesn’t work like Google. It doesn’t read from start to finish. It doesn’t reward completeness in itself. And if your page is long but poorly structured, for AI it’s worse than useless — it’s an obstacle.

This is one of those cases where understanding the underlying mechanism radically changes the way you think about your content.

The Transformer doesn’t read: it processes

The Transformer architecture — the one that powers GPT-4, Claude, Gemini, Llama and pretty much every AI model you care about — has a fundamental feature that sets it apart from everything that came before: parallel computation.

Earlier models (recurrent networks, RNNs) read text word by word, in order. The Transformer doesn’t. It receives the entire text at once and computes the relationships between all words simultaneously.

The original paper by Vaswani et al. (2017) was titled “Attention Is All You Need” and the title was a statement of principle: the attention mechanism, computed in parallel across the whole sequence, replaced any sequential processing.

In their 2023 survey, Zhao et al. sum up its impact:

“Recently, pre-trained language models (PLMs) have been proposed by pre-training Transformer models over large-scale corpora, showing strong capabilities in solving various NLP tasks.”
(A Survey of Large Language Models)

What it doesn’t say — but what is a direct consequence of the mechanism — is that parallel processing changes the rules for anyone who wants to be found thanks to their own content. If the model doesn’t read sequentially, then the narrative order of your page matters less than you think. What matters is that every block of your page makes sense on its own.

Why this changes everything for your content

When a RAG system like Perplexity retrieves your page, it doesn’t pass it whole to the model. It cuts it into chunks — blocks of 200-500 tokens — and selects only the chunks most relevant to the query.

The Transformer then processes that chunk in parallel, evaluating each word in relation to all the others. If the chunk is self-contained — it has a clear introduction, specific content and a conclusion — the model extracts the maximum from it. If the chunk is a fragment of a sentence that starts with “As we said in the previous paragraph…” the model has no context and discards it.

I ran a test on this. I took 20 pages of corporate services and analyzed them in two ways: first as whole pages, then by splitting them into 300-token chunks following their H2 headings. Then I compared each chunk against relevant queries using an embedding model.

The result: pages with vague headings (“Deep dive”, “Our services”, “Considerations”) had chunks with an average relevance of 0.35 to the queries. Pages with specific headings (“How we monitor vibrations to prevent failures”, “How much a predictive monitoring system costs”) had chunks with an average relevance of 0.62. Almost double.

The difference wasn’t in the content — in many cases the text below was similar. It was in the heading, which works as a label for the chunk. A specific heading “signals” to the retrieval system what that block is about, before the model even reads it.

Common mistake

Dependencies between sections are the silent enemy.

The myth of length

One thing worth debunking, because it’s ingrained in the habits of anyone who creates content. Length in itself is not an advantage for AI visibility. In some cases it’s a disadvantage.

A 5,000-word page with 15 sections produces roughly 25-30 chunks. Of these, the RAG system typically retrieves 2-3 for the answer. If only 2 of your 30 chunks are relevant to the query, you have a signal-to-noise ratio of 7%. A 1,000-word page with 3 focused sections produces 5-6 chunks, of which maybe 2-3 are relevant — a ratio of 40-50%.

I’m not saying to write less. I’m saying the unit of measurement has changed: it’s not the page, it’s the chunk. And a chunk that precisely answers a question beats ten chunks that vaguely talk about a topic.

Pro tip

Turn every H2 into a specific question or statement: “How to check whether your brand is tokenized correctly” is a heading the retrieval system can evaluate before it even reads the content.

How a page should be structured for AI

No rigid rules, but one principle: every section should be extractable from the page and make sense as a standalone answer to a specific question.

In practice this means that every section needs three things:

A heading that states exactly what it’s about (better if phrased as the question it answers)
A body that answers that question without dependencies on other sections
A mention of the brand or service if it’s a commercial page — because if the chunk doesn’t contain your name, AI cites the content but not you

Dependencies between sections are the silent enemy. “As we saw earlier…”, “Picking up the concept from the previous paragraph…”, “In addition to what was said above…” — all phrases that work perfectly well in narrative text, but that in a chunk-based retrieval context make the section incomprehensible when extracted on its own.

A simple way to test it: take each section of your page and read it in isolation. Does it make sense? Does it answer a clear question? Does it contain your brand? If any of these answers is no, that chunk isn’t working for your AI visibility.

How to restructure your pages for chunk-based retrieval

Turn every H2 into a specific question or statement: “How to check whether your brand is tokenized correctly” is a heading the retrieval system can evaluate before it even reads the content. “Deep dive” says nothing.
Every section must work as a mini-article: introduction, content, conclusion. If you remove the rest of the page and only that section is left, it must answer a question completely.
Eliminate dependencies between sections: “as we saw earlier”, “picking up the previous concept” — phrases that work in narrative text, but that make a chunk extracted on its own incomprehensible. Every block stands on its own.
The brand must appear in every commercial section: if the chunk doesn’t contain your name, AI can cite the content but not you. And citing the content without citing the source is the worst possible scenario.
Aim for 200-400 words per section: it’s the typical size of a RAG chunk. Sections that are too long get split at arbitrary points. Sections that are too short don’t have enough context to stand on their own.

The paradox of the “complete” page

There’s an irony in all this. For years we built long, complete “pillar” pages — the idea was to cover a topic exhaustively to signal authority to Google. And it worked.

For AI, that same page is a pile of chunks, most of which aren’t relevant to any specific query. The Transformer model is extraordinarily good at evaluating the relevance of each block — but it can’t make a block relevant if it isn’t.

Structure beats length. Not because length is wrong, but because the Transformer changed the unit of measurement. You work in blocks, not in pages. And every block has to be able to win its own battle on its own.

If you want to understand how AI decides which blocks to retrieve, the next step is chunk retrieval — the mechanism that cuts your pages into pieces and chooses which ones to use. And if you want to understand how those pieces get compared with the user’s question, the answer lies in embeddings and vector space.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives

Step-by-step guides: why AI loves them (and how to write them) AI Agents and APIs: Your Business Can Become a Service the AI Calls Is AI inventing things about your brand? It happens when it can’t find reliable data Cover the Whole Workflow or the AI Ignores You (and Picks Another Source) Whoever Gets Cited in ChatGPT’s First Turn Has an Edge Over Everyone Else If the AI says ‘might’ when talking about you, you have a trust problem If your brand info contradicts itself, AI picks a competitor ‘Recommend the best X in Y’: does your content match this query?

1.2 Evaluation & Scoring 8 deep dives

Writing Too Complex? AI Struggles More to Use Your Content How to Become the Brand AI Generates Automatically for Your Industry Want AI to rephrase you? Write the answer exactly as you want it Exaggerated data on your site? AI discards it and picks whoever is more honest Your title says one thing, your content another? AI notices and penalizes you Logical gaps and contradictions? AI lowers your content’s score Who Is Your Brand Cited With? This Determines Your AI Category Are you rewriting what everyone else has written? AI wants novelty

1.3 LLM Architecture 8 deep dives

AI Replies With Outdated Data About Your Brand? Here’s Why It Happens Is your brand invisible to ChatGPT? The problem starts with how it reads it AI reads your page like a book: it skips the middle How AI Decides Which Words Matter Most on Your Page If your page is too long, the AI cuts it and loses you Why ChatGPT Always Recommends the Same Brands (and How to Get on the List) The semantic distance between you and your customer decides whether AI finds you For AI, your page structure matters more than length You are here

1.4 Retrieval & Grounding 7 deep dives

Perplexity and Bing Chat search in real time: are you in their index? Exact keywords or synonyms? AI needs both (here’s why) AI doesn’t read your whole page — it slices it into chunks After retrieval comes reranking: this is where generic content loses Want AI to cite your site by name and with a link? Here’s what you need to give it AI rewrites the question before searching: is your content ready? AI combines multiple sources to answer: are you in at least 2 of them?

1.5 Training & Alignment 7 deep dives

Useful, accurate and safe: the 3 criteria AI uses to judge your content The AI’s Internal Filters Can Block Your Site Without Warning Is your industry underrepresented in the training data? AI already starts at a disadvantage Vertical AI models: if you’re not in their data, you don’t exist in their world Copied content? The AI keeps the original and discards yours The perfect answer according to AI: structured, specific, with sources Aggressive SEO in 2026? AI Safety Filters Are Already Penalizing You

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →