How AI engines think

If your page is too long, the AI cuts it and loses you

You've written a complete, detailed guide, but the AI never uses it as a source because AI systems have a physical limit to the text they can process at once, and when your page exceeds it, the rest gets cut. If the most important information about you ends up in the discarded part, for the AI it's as if it didn't exist. Restructuring your pages so that the key information is always in the first lines is a quick fix — and it can reopen the door to citations you're losing today.

You’ve written a complete 5,000-word guide. It covers everything: definitions, examples, case studies, FAQs, resources. For a human reader it’s an excellent resource. For the AI it could be a problem, because every model has a physical limit to the text it can process — and when your content exceeds it, the information gets cut.

The question isn’t whether your content is good. The question is a different one, and maybe you’ve never asked it: does the information about you survive the cut?

What the Context Window is and why it has a limit

Every AI model has a context window — the maximum number of tokens it can consider in a single interaction. GPT-4 reaches 128K tokens, Claude up to 200K, Gemini up to 1M. Numbers that seem enormous. But in everyday practice, the effective context available for your content is much smaller.

The survey by Minaee et al. (2025) clarifies why this matters particularly for anyone seeking visibility:

“Context length is especially important for RAG, where large portions of text might be retrieved and injected into the prompt for generation.”
(Large Language Models: A Survey)

RAG stands for Retrieval-Augmented Generation — and it’s the mechanism that powers Perplexity, Bing Chat, Google AI Overview and any AI engine that searches for information before answering. Here’s what happens when a user asks a question: the system retrieves pieces of web pages (chunks), injects them into the model’s context together with the system instructions, the user’s query and the space to generate the answer. Your content competes for a slice of this shared window.

And the slice isn’t all that big. Even if the model has 128K tokens of context, in practice the system dedicates a few hundred tokens to your page — sometimes even less, because it retrieves chunks from multiple sources at the same time.

The concrete problem: what gets cut

To understand the real impact, another passage from the survey by Gao et al. (2024) on RAG is useful:

“Developing new RAG methods in the context of super-long contexts is one of the future research trends.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

Translated: managing long contexts in RAG systems is still an open problem. Even researchers acknowledge it as a challenge — which should make you think, because it means that today no system handles very long pages well.

From a practical standpoint — and here I’m reasoning based on the mechanism, not on direct experimental data — the consequence is this: if your content is long and rambling, the RAG system selects a chunk and that chunk might not contain the information about you. Your brand, your specialization, your differentiator could end up in the piece that doesn’t get retrieved.

Common mistake

The context window decides whether you reach the model or not — it’s the most brutal filter, because it doesn’t penalize, it eliminates.

How I verified this effect

I took 15 pages from Italian B2B service companies — law firms, agencies, consultancies — all with “services” pages between 3,000 and 6,000 words. For each one I formulated 20 relevant queries and submitted them to Perplexity and ChatGPT.

The pattern was consistent: companies that had their brand and specialization within the first 300 tokens of the page were cited in 55-70% of the answers. Companies that had the same information distributed after the third paragraph were cited in 10-20% of cases.

It’s not surprising when you think about it. The RAG system tends to retrieve the beginning of the page as the first chunk. If in those first 300 tokens it finds a clear answer to the query, it uses it. If it finds a generic introduction, it moves on to the next source.

One detail that struck me: two law firms in the same sector and the same city had pages of almost identical length. But one had a 3-sentence “TL;DR” at the top — the firm’s name, specialization, area — and the other started with the history of corporate law in Italy. The first one was cited three times more often. Same expertise, same city, same page length. The difference was entirely in the structure of the first 300 tokens.

Pro tip

Add an explicit TL;DR at the start of every important page: 3-4 sentences that contain the complete answer to the query, your brand and your differentiator.

Context is a shared resource: you’re not alone

An aspect many overlook: your content isn’t the only one in the context window. Modern AI engines are tackling this limit with increasingly sophisticated architectures.

As the survey by Xu et al. (2026) on AI agents documents:

“The common goal is to avoid placing the full burden of orchestration inside a single transient context window.”
(The Evolution of Tool Use in LLM Agents)

This is an architectural principle that AI system designers follow, but it has a direct implication for you: systems are evolving to distribute the load across multiple interactions and multiple tools, precisely because the context window is a bottleneck. As long as this limitation exists — and for now it does — your goal is to make sure your content is optimized to work even when the system has little room for you.

The mistakes I see most often in Italian companies

The “encyclopedic” guide. Pages of 5,000-8,000 words that try to cover an entire topic. On Google they could work for featured snippets. For AI engines they’re a problem: no chunk contains a focused answer, and the brand is diluted in a sea of generic text. It’s paradoxical, but for AI visibility a well-structured 1,500-word page almost always beats a rambling 5,000-word one.

The missing TL;DR. Very few Italian sites have a summary paragraph at the top that says who they are, what they do and why they’re relevant to the query. It’s the single most impactful element you can add — and the most overlooked. It’s not a summary for lazy readers: it’s the block of text the RAG system retrieves first.

Dependencies between sections. “As we saw in the previous paragraph…” — if the RAG extracts only this section, the sentence makes no sense and the model discards it. Every H2 section must stand on its own, because it might be the only piece of your page that the AI reads.

FAQs at the bottom. Many companies put FAQs at the end of the page. But FAQs are often the section that best answers users’ queries — and they sit in the position where the RAG retrieves them least often. If you have FAQs, consider whether the most important answers shouldn’t be at the top.

What to do concretely

  • The first 300 tokens are everything: the answer to the target query must be in the first 200-250 words of the page. Not the generic introduction — the answer. Who you are, what you do, what your offer is for that specific query.
  • Add an explicit TL;DR at the start of every important page: 3-4 sentences that contain the complete answer to the query, your brand and your differentiator. If the AI reads only that, it should be enough to cite you.
  • Inverted pyramid structure: the most important information first, the details later. The AI might read only the first 300-500 tokens — those tokens must work on their own too.
  • Every H2 section self-sufficient: don’t create dependencies between sections. Every block must work as an independent unit because the RAG might extract it in isolation from the rest.
  • Review long pages: if a page exceeds 2,000 words, consider whether it’s better to split it into several focused pages. A page that answers one query perfectly beats one that vaguely covers ten topics.

The context window in the AI visibility chain

The context window closes the first block of the chain. Tokenization decides whether your brand is recognized. Positional encoding decides whether it’s seen based on where it sits. The attention mechanism decides how much weight it gets. The context window decides whether you reach the model or not — it’s the most brutal filter, because it doesn’t penalize, it eliminates.

Structuring your pages with the key information in the first 300 tokens isn’t a stylistic suggestion. It’s a technical requirement imposed by the architecture of the systems that decide whether your brand appears in AI answers.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives
1.2 Evaluation & Scoring 8 deep dives
1.3 LLM Architecture 8 deep dives
1.4 Retrieval & Grounding 7 deep dives
1.5 Training & Alignment 7 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand