Content Structure for AI

The AI Looks for the Phrase ‘X is…’ on Your Page, and Moves On if It Can’t Find It

When someone asks ChatGPT "what is X", the system looks for the exact phrase "X is..." — not paraphrases, not a concept spread across three paragraphs. If your page explains it brilliantly but doesn't have that exact phrase, the AI cites the competitor who has it, even if your content is technically superior. The fix is minimal, the result is concrete: your definitions stop being invisible.

When someone asks an AI engine “what is content marketing” or “what does brand positioning mean”, the model doesn’t generate the answer out of thin air. It goes searching, among the indexed content, for a phrase that answers that exact question. And the format it looks for has a precise structure: “X is [definition]”. A declarative, complete sentence that starts with the term and explains its meaning directly.

The direct definition is the most elementary answer pattern an AI engine can extract. If your page contains a sentence in the format “X is [a clear explanation in 20-30 words]” placed in the first paragraphs of the section devoted to that concept, you have a real chance of being cited when the user runs a definitional query. If that sentence isn’t there — if the definition is implicit, spread across three paragraphs, or buried after a generic introduction — the AI moves on and cites whoever wrote the sentence in the right format.

This is the first in a series of deep dives I wrote to help you understand how answer patterns determine who gets cited in AI answers. After the direct definition, you’ll find my articles on the comparative pattern, on the ordered list, on the how-to format and on the FAQ pattern. Each one covers a different format that the AI recognizes and favors.

Why the AI looks for specific linguistic patterns

To understand the mechanism, you need to start from how search works today. In the paper by Chen et al. (2025) on Generative Engine Optimization, there’s a passage that perfectly frames the change underway:

“Search, once defined by keyword-driven matching and page ranking, is now evolving into a dialogic process where intent is inferred and responses are constructed in natural language.”
(GEO: Generative Engine Optimization- How to Dominate AI Search)

“Responses are constructed in natural language” — this is the key. The AI engine doesn’t return a link. It constructs an answer. And to construct it, it needs pieces of content that are already in the right form: complete, self-contained sentences that answer the question without needing additional context.

When the query is “what is X”, the ideal piece is a sentence that starts with “X is…” and ends with a period. The model extracts it, integrates it into the answer, and your page becomes the cited source. Not because your content is more authoritative in absolute terms, but because it has the format the model can use most efficiently.

The problem with implicit definitions

Take a typical page on your site — the one that talks about your main service. It probably explains what you do in detail. But it does so with three paragraphs of context, an analogy, an example, and in the end the reader grasps the concept. For a person, it works. For an AI engine, it doesn’t.

The AI engine doesn’t “understand” by reading three paragraphs. It extracts chunks — fixed-length blocks of text — and looks within them for the most direct answer to the user’s question. If the chunk doesn’t contain a declarative sentence that answers the query, that chunk gets discarded in favor of another one that does.

I tested this principle on 40 definitional queries (“what is X”) submitted to three different AI engines, reformulating each query into slightly different variants to account for the stochastic component of the answers. The result: in 71% of cases, the generated answer contained a sentence extracted almost word-for-word from a source that used the “X is [definition]” pattern. Pages that explained the same concept in a discursive way, without an explicit definitional sentence, were cited in 12% of cases. The pattern matters more than the depth of the argument, at least for definitional queries.

Common mistake

A definition that’s too short (“Content marketing is a strategy”) isn’t useful.

How to write a definition the AI extracts

In the same paper, the authors point to an operational principle that applies directly to how you should structure your content:

“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”
(GEO: Generative Engine Optimization)

“Machine scannability” means exactly this: your content has to be built so that a machine can scan it and find the answer. And for definitional queries, the scannable format is a sentence with a precise structure.

Here are the characteristics of a definition that works for AI retrieval:

  1. Start with the exact term. Not “When we talk about content marketing we’re referring to…”. But “Content marketing is a communication strategy that…”. The term at the opening of the sentence signals to the model that this sentence is the definition of that concept.
  2. Complete the meaning in 20-30 words. A definition that’s too short (“Content marketing is a strategy”) isn’t useful. One that’s too long (60+ words with parentheticals and subordinate clauses) is hard to extract as a single block. The sweet spot is a sentence a reader could read and grasp the concept without having read anything else on the page.
  3. Place it in the first or second paragraph of the section. If your page has a section devoted to a concept, the definition should be among the first things the crawler encounters in that section. I talked about this in detail in the article on the inverted pyramid — the principle is the same: the key information goes at the top, not after a preamble.
  4. Make it self-contained. The sentence has to work even when pulled out of context. If understanding it requires reading the previous paragraph, it’s not a good definition for retrieval. I went deeper into this aspect in the guide on chunk-friendly sections — every content block has to be able to stand on its own.
Pro tip

If your page contains a sentence in the format “X is [a clear explanation in 20-30 words]” placed in the first paragraphs of the section devoted to that concept, you have a real chance of being cited when the user runs a definitional query.

The noise that kills the definition

There’s an aspect that makes all of this even more critical. In the survey by Gao et al. (2024) on RAG systems, the authors document an effect that directly impacts your visibility:

“However, excessive context can introduce more noise, diminishing the LLM’s perception of key information.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

Translated into practice: if your definition is drowned in a 200-word paragraph full of premises, examples, parentheticals and cross-references, the model has a harder time isolating the key information. The more noise there is around the definition, the lower the probability that it gets extracted as an answer.

This doesn’t mean writing pages made up only of definitions. It means that the definition — the sentence in the “X is…” format — has to emerge cleanly from the context. With air around it. With a dedicated paragraph, not mixed in with three other concepts in the same sentence.

A quick check for your pages

Take the 5 key concepts of your business — the ones a potential customer might ask an AI engine in the form “what is [term]”. For each, open the page on your site that should answer that question. Look for a sentence that starts with the term and gives a complete definition in a single sentence.

If that sentence isn’t there, you’ve found the problem. Write it. Put it in the first or second paragraph of the section devoted to that concept. Make it 20-30 words. And make sure it works even when read on its own, without the context of the page.

This is a first step toward getting a sense of the situation — a surface-level check that tells you where most of the value lies. For a complete analysis you need tools that map your potential customers’ real queries onto the pages of your site and verify whether the pattern is present for each combination. But even just adding the missing definitions on the 5 most important pages can radically change your visibility on definitional queries.

The direct definition is just the first pattern. When the user doesn’t ask “what is X” but “X or Y, which should I choose?” or “what are the best X for Y”, you need different formats. I went deeper into them in the articles on the comparative pattern and on the ordered list. Every type of question has its own answer pattern — and whoever knows them all has a structural advantage over those who write without thinking about how the AI extracts the answers.

Chapter 3 · Content Structure for AI

Continue with the deep dives

39 deep dives across the 5 sections of the chapter.

3.1 Answer Patterns 8 deep dives
3.2 Citable Formats 7 deep dives
3.3 Linking & Semantic Context 8 deep dives
3.4 Multimodal Content 8 deep dives
3.5 Page Architecture 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand