You can have the best content in your industry — well written, in-depth, useful — and AI will still ignore it if it has the wrong structure. The models don't read the way a human does: they look for precise patterns, and if they don't find them they move on. Every article published without that structure is wasted visibility, while the competitor with mediocre but well-formatted content gets cited instead of you. Writing to be extracted and cited by AI is a specific skill — and whoever masters it first gains an advantage that's hard to close.

You have solid content, well written, with information your customers search for every day. And yet, when someone asks the same question to an AI engine, your name doesn’t come up. A competitor does. Or a generic answer comes up, built from sources you’ve never heard of. And you’re left out.

The problem, in most cases, isn’t what you write. It’s how you write it. And above all, it’s how you structure it.

AI engines don’t read your pages the way a human reads them. They don’t start at the beginning, they don’t scroll calmly, they don’t pick up the nuances between one paragraph and the next. They process linguistic patterns and semantic structures. They cut pages into blocks. They evaluate each block in isolation. And they decide in fractions of a second whether that block answers the user’s question or whether it’s better to pull it from another source.

Content written well for a human can be invisible to AI if it has the wrong structure. And vice versa: content that’s less elegant but structured the right way ends up in the answers, gets cited, generates visibility.

This is the most hands-on pillar of the entire series I’ve written on visibility in AI answers. Here we’re not talking about theory or how the models work: we’re talking about what to do, concretely, to your content. Every section you find in this guide opens a block of deep-dives I’ve written to give you the practical tools to work on your pages and turn them into content that AI can read, extract and cite.

If you’ve already read my articles on how AI engines think and on authority and credibility, you know how the engine works and how trust is built. Now we move to the work on the ground: taking your existing content and rewriting it so it works.

What you find here isn’t theory. It’s the result of tests I ran on dozens of pages, analyzing how the different AI engines extract and cite content based on its structure. The rules I give you are operational: you can apply them today, to pages you’ve already published, and measure the difference.

How a page must be built so that AI finds it

The first thing you need to understand is that a page’s architecture isn’t an aesthetic detail. It’s the factor that determines whether your content gets found, extracted and used by the retrieval system, or whether it gets discarded before it’s even read in full.

How ChatGPT, Perplexity, Gemini sees you today

I’ve verified this behavior on dozens of pages: the same answer rises or disappears from the citations simply by changing where the chunk boundary falls. The engine evaluates the block, not the page.

An answer buried halfway down the page never gets extracted

RAG systems — the ones that power the answers of ChatGPT, Perplexity, Gemini and the others — don’t process pages from beginning to end. They cut them into blocks of 200-500 tokens and evaluate each block as a standalone unit. If your best answer sits in the eighth paragraph, the system might never get to it. If a section needs the previous one to make sense, that block gets discarded. If the section titles are generic, the model doesn’t know what they’re about and classifies them under the wrong topic.

I’ve written eight deep-dives dedicated to taking apart every element of page architecture that affects your visibility in AI answers. I start from a principle journalism has known for a century: the inverted pyramid. In the article How to get found by AI: put the answer in the first 150 tokens I explain why your page’s key answer has to be at the top, not after a three-paragraph introduction. It’s a change that on its own can transform how a page performs.

But putting the answer at the top isn’t enough if the rest of the page is a monolithic block. Every section has to work as a self-contained unit that the system can extract and cite on its own. In my article If your sections can’t stand on their own, AI discards them I show you how to turn every section into a mini-article with question, answer and support, all within the budget of a chunk.

The heading hierarchy is the map the model consults to orient itself. If that map says “Deep dive” and “Find out more”, the system doesn’t understand what the sections are about. In the article AI doesn’t read your generic headings: it ignores them I explain how to write headings that work as a semantic index for retrieval.

And speaking of indexes: a table of contents with anchor links at the top of the page works as a compressed map of the entire content. I’ve devoted a deep-dive to this mechanism: Your article doesn’t have a table of contents? AI is searching for answers in the dark. A few tokens that tell the system exactly where to find what.

Then there’s a mistake I see on almost every site: the page’s first viewport taken up by a decorative banner, a hero image, a cookie wall. AI engines see none of that. They see text. And the first text they encounter is the one with the highest probability of being extracted. I cover this in You’re wasting your page’s first viewport on a decorative banner.

The page’s position in the site hierarchy matters too. Breadcrumbs tell the system whether the page is a general guide or a third-level deep-dive, and this information influences how the model classifies it. The article AI doesn’t know where your page sits without breadcrumbs explains the mechanism in detail.

A summary block of three or four sentences placed at the start of the page is the chunk with the highest probability of being extracted and cited. In Want AI to cite your article? Give it a TL;DR to copy I show you how to write it so it works as a ready-made answer the model can take almost word for word.

And finally, a topic almost no one considers: noise. Widgets, sidebars, footers repeated on every page dilute the useful signal of your content. In Your sidebar is polluting the content AI extracts I explain how to clean up your pages so the signal-to-noise ratio works in your favor.

Writing in the format the model expects

Page architecture determines whether your content gets found. The answer pattern determines whether it gets cited.

When a user asks an AI engine something, the system isn’t looking for an article. It’s looking for a block of text that answers the question in the format as close as possible to the one it has to generate. If the query is “what is X”, the system looks for a defining sentence. If the query is “X vs Y”, it looks for a structured comparison. If the query is “how to do X”, it looks for numbered steps.

If your content has the right answer in the wrong format, the model takes it, reworks it and attributes it to someone else. If your content has the right answer in the right format, the model extracts it and cites you.

I’ve written eight articles on eight different patterns that AI engines recognize and favor. They’re specific formats, each designed for a type of query.

Pro tip

Open every section with a direct definition in the X is format followed by a 20-30 word explanation. It’s the pattern the model looks for on what-is queries, and it’s the block most likely to be cited.

The first is the most basic: the direct definition. A sentence in the format “X is [clear explanation in 20-30 words]” placed in the first paragraphs is the defining pattern AI looks for when someone asks “what is”.

Then there are comparisons. Queries with “vs” or “or” generate comparative answers with criteria, strengths and contextual recommendations. If there are pairs to compare in your industry and you don’t do it, the model assembles fragments from different sources. In If your industry has pairs to compare and you don’t do it, AI cites someone else I explain how to build the page that becomes the source of that comparison.

Ordered lists with explicit criteria are another format that retrieval recognizes and rewards. Not a random list: a ranking with visible logic. The article Your “best X for Y” lists are invisible to AI if they don’t have a clear criterion shows you the difference between a list that gets ignored and one that becomes the answer.

Step-by-step guides have their own structure: numbered steps, clear actions, expected result. If your guide is a wall of discursive text, the model discards it. In Are your guides a wall of text? AI can’t extract them as an answer I show you how to take a narrative piece apart and rebuild it in a format the system maps directly onto its own reasoning.

FAQs are a special case. The ones with one-line answers or with “find out more” as the answer simply don’t exist for AI. The engine doesn’t click. I cover this in Do your FAQs have one-line answers? They’re useless to AI.

The cause-effect pattern serves “why” queries. Describing a phenomenon isn’t enough: you need a logical chain that links cause, effect and solution. The article Does your content explain the “what” but not the “why”? shows you how to structure that chain.

Numerical data — percentages, figures, precise dates — are anchors of credibility that lower the model’s hallucination risk.

And finally the pros/cons balance. If your content only talks about the advantages, the model classifies it as promotional and prefers sources that show both sides. The deep-dive Do you only talk about the advantages? AI classifies you as promotional takes apart the mechanism and shows you how to write recommendations the system perceives as trustworthy.

Making your content easy to extract and cite

At this point you know how to build the page and how to write in the right format. The third level is the technical format: how you present the data in the code and markup directly affects the system’s ability to extract it without errors.

Pro tip

Take the comparisons, specs and pricing plans you’ve written in prose and convert them into an HTML table with row and column headers. Retrieval extracts the cells without having to reconstruct them from the prose.

This isn’t a developer matter. It’s a visibility matter. A clean HTML table is ten times more citable than the same comparison written in prose. A highlighted callout gets extracted before regular text. Schema markup tells AI exactly what the page contains without it having to guess from the text.

I’ve written eight deep-dives on eight formats that AI can read, extract and cite precisely. They’re technical changes within reach of anyone who runs a site: you don’t need developer skills, but you do need to know which formats retrieval favors and why.

HTML tables are the most underrated format. If you have comparisons, specs, pricing plans written in prose, you’re making retrieval’s job ten times harder. In Are your comparisons written in prose? In a table they’d be 10 times more citable I show you how to convert them and why it works.

Lists with semantic markup are the complement to tables for sequential data. But a list with one-word items — “Speed”, “Reliability”, “Support” — is noise to AI. In Do your list items just say “Speed”? They mean nothing to AI I explain how to write items the system can process as complete information.

Callouts and snippet boxes aren’t decoration. They’re structural signals that retrieval recognizes as high information-density zones.

FAQ and HowTo schema isn’t just for Google’s rich snippets. Generative AI engines get a different benefit from it, for a reason I explain here.

If your content cites sources with the author’s name, year and link, AI treats you as a higher-tier resource. In Do you cite your sources? AI treats you as a higher-tier resource I show you how this mechanism changes your position in the hierarchy of sources.

JSON-LD is the format that removes all ambiguity for structured data: price, availability, rating, specs. Every piece of data labeled, typed and tied to a precise entity. The article Is your key information only in the text? With JSON-LD, AI reads it without errors shows you how to implement it.

Downloadable content — PDFs with correct metadata, white papers, reports — gets indexed separately and carries a different weight in the corpus. I cover this in Does your best content only exist as web pages? As PDFs it becomes standalone assets.

And then there’s the distinction between evergreen content and time-sensitive content, which AI treats in completely different ways. If you only have one of the two types, you’re covering half the field. The deep-dive Do you only have evergreen guides? You’re losing the citations on industry news shows you how to balance the strategy.

Anything that isn’t text doesn’t exist for AI

Infographics, videos, podcasts, diagrams. These are often your best content: the pieces where you explain your work with a depth you’ve never reached in the site’s text. For those who watch or listen to them, they’re your strongest material.

For AI, they don’t exist. And this is a huge problem, because it means a substantial slice of the value you produce isn’t even considered when the AI engine has to build an answer in your industry.

It’s not a temporary limitation. It’s a direct consequence of how retrieval works: RAG systems extract text, not pixels, not audio. Every non-textual piece of content needs a translation to become visible. And that translation, done well, not only makes the content accessible to retrieval but creates new citable chunks that didn’t exist before.

I’ve written eight articles to help you cover every corner of multimodal content and turn it into text that AI can process.

Image alt text is the starting point. An alt like “sales chart” is like not having one. In Do your infographics have alt text like “sales chart”? They don’t exist for AI I show you how to write alt text that works as real content, not as an accessibility obligation.

Common mistake

Publishing videos, podcasts and infographics without a textual counterpart. Retrieval extracts text, not pixels or audio: without a transcript, alt text and parallel text, that material never makes it into AI answers.

Videos and podcasts are the most striking case: hours of excellent content, completely invisible. The transcript is the only solution, but you have to do it in a way that generates standalone, citable chunks.

Infographics need a parallel text that contains the same data in a form retrieval can read.

Captions under images are a micro-chunk with very clear boundaries. If they contain “Figure 1”, the system discards them. If they contain the key data in a complete sentence, they become citable material. The article “Figure 1” says nothing to anyone — write captions AI can cite shows you how to take advantage of them.

Diagrams — flowcharts, concept maps, org charts — are images. For retrieval, they don’t exist. In this guide of mine I explain how to translate them into structured text without losing the information.

Video chapters create footholds for retrieval. Without chapters, a 45-minute video is a monolithic block the system doesn’t know how to split.

Embeddable content — widgets, calculators, tools that other sites embed — creates distributed mentions of your brand on third-party sources. Every embed is one more signal in the corpus AI consults.

And podcast show notes: if they’re three lines under the player, that page is an empty shell to retrieval. In Are your podcast show notes three lines? You’re handing a hub to the competition I explain how to turn them into pages rich with citable chunks.

The links between pages tell AI who the authority is

The final level concerns how your pages talk to each other. It’s not a technical detail relegated to the developer: the structure of internal links and context signals is the mechanism AI uses to determine which page is the most important on your site for a given topic.

You can have perfectly structured pages, with flawless answer patterns and citable formats. But if those pages are disconnected islands, the system evaluates them individually — and a single page almost always loses against a site that shows organized topical coverage.

I’ve seen sites with excellent content lose visibility in AI answers to competitors with shorter, less in-depth articles, but linked together in a coherent network. The system doesn’t only reward the quality of the single piece: it rewards the structure that demonstrates complete coverage of a topic. And that structure is built with the right links, in the right context, with the right words around them.

I’ve written eight deep-dives to help you understand how links and semantic context affect visibility in AI answers.

Common mistake

Leaving your best pages as islands with no inbound internal links. The system evaluates an isolated page on its own and almost always loses against a site that shows linked topical coverage.

Every internal link is a vote of relevance. How many pages on your site point to your most important resource? And with what text in the link?

Link text is information the crawler processes. “Click here” says nothing. Anchor text that describes the destination’s content says everything.

Silo architecture — grouping content by topic with coherent internal links — turns your site from a disordered library into a vertical authority.

But inserting a link isn’t enough: the paragraph that contains it has to explain why the linked content is relevant. It’s the contextual bridge retrieval uses to understand the site’s logical structure.

The “related articles” section at the bottom of the page has enormous potential if it’s handled with editorial logic instead of a random algorithm.

The hub-and-spoke model — a pillar page that covers the general topic linked to specific deep-dives — is the structure AI reads as topical authority. In A pillar page on its own isn’t enough — you need a system AI reads as authority I explain how to implement it.

Duplicate content confuses the system: if the same page exists at two different URLs, the model doesn’t know which one to cite and your site loses perceived authority.

And finally, topical completeness. If you cover a topic halfway, the AI engine chooses whoever covers it in full.

Frequently asked questions

Do I have to rewrite all my content from scratch?
No. In most cases, your existing content already has the informational value it needs. What’s missing is the structure. Changes like moving the key answer into the first 150 tokens, adding a TL;DR, rewriting the headings and inserting a direct definition are surgical edits you can make to existing pages without rewriting them.

Do these rules apply to all AI engines or only to ChatGPT?
The structural principles you find in this guide work on all systems that use retrieval-augmented generation: ChatGPT, Perplexity, Gemini, Claude. Each engine has its own specifics in ranking and in how it presents answers, but the basic mechanics — cutting into chunks, evaluating relevance, extracting the best block — are the same. A well-structured page works on all of them.

How long does it take to see results?
It depends on how often the crawlers revisit your pages. Some structural changes can take effect within weeks, once the system reindexes the content. Others, like building a coherent internal link network, take more time because the signal accumulates as new content and new links are added.

Can I apply these principles to a brand-new blog too?
Absolutely. In fact, starting with the right structure is enormously more efficient than having to rewrite everything later. If you’re creating new content, design it with the inverted pyramid, descriptive headings, self-contained sections and appropriate answer patterns from the very first article. The competitive advantage is built from the first piece of content.

Does AI read my videos and my infographics?
No. Retrieval systems process text. Videos, images, audio and diagrams are invisible to the model unless they have a textual counterpart: transcripts, descriptive alt text, informative captions, parallel text. If your best content is in a non-textual format, it doesn’t exist for AI until you translate it.

Are schema markup and JSON-LD really useful for AI engines, or are they only for Google?
Generative AI engines don’t have a dedicated JSON-LD parser the way Google does. But schema markup offers an indirect advantage: it makes the page’s data unambiguous. When the crawler processes a page with structured data in JSON-LD, the risk of error during extraction drops drastically. It’s an advantage that works for both Google and AI engines.

What’s the change with the best effort-to-result ratio?
If I had to pick a single change, it would be adding a TL;DR paragraph of three or four sentences at the start of every important page. It’s the change that takes the least time, doesn’t touch the rest of the content, and creates a pre-packaged chunk with the highest probability of extraction. Right after that: rewriting headings to be descriptive and adding a direct definition in the first paragraphs of every section.

How do I check whether my content is already well structured?
You can do a first check yourself: open the page, look at the first 150 tokens and ask yourself whether they contain the answer to your customer’s question. Then look at the headings: do they say what the section is about, or are they generic? Is there a table of contents? Is there a TL;DR? Do the sections work on their own or do they depend on one another? These checks give you a first snapshot of the situation. For a complete picture of your visibility in AI answers you need professional tools and a systematic analysis of all your strategic pages, but this is a good starting point for figuring out where to act.

The next step

Everything you’ve read in this guide has a common thread: AI engines don’t reward whoever writes best in absolute terms. They reward whoever writes in the way the system can read, extract and cite. And that way is documented, testable, replicable. It’s not an opinion — it’s mechanics.

The starting point is to take one of your most important pages — the one that should answer when a customer asks a question in your industry — and check how many of these principles it respects. Not all of them. The first three are enough: answer in the first 150 tokens, descriptive headings, TL;DR at the top. If these are missing, you’ve found the first change to make.

Want to know how AI sees your brand right now?

Enter your company name and your industry — I personally run a check on ChatGPT, Perplexity and Gemini and send you the result by email. It’s free and it takes 24 hours.

How visible is your brand to AI?

Find out in 30 seconds with our free tool. 11 automated checks, immediate results.

Analyze your brand

Content Structure for AI