Content Structure for AI

Schema markup isn’t just for Google: AI uses it as a ready-made summary

Roberto Serra 25 June 2026·~8 min read

There's a piece of technical code that works like a pre-packaged summary for AI: instead of reading and interpreting your text, the model receives the data already prepared. Those who don't have it make their own content harder to use than competitors who already do. It's not a complicated job: the three elements that really matter can be configured in half a day — and the difference in terms of citability is visible.

You’ve implemented FAQPage schema on your FAQ pages. You’ve added Article markup with author, publication date and dateModified. Maybe you even have HowTo schema on your step-by-step guides. Great for Google’s rich snippets. But if you think the work ends there, you’re missing half the game.

Generative AI engines — the ones that produce synthesized answers citing sources — don’t process pages the way Google does. They don’t have a dedicated parser for JSON-LD. Yet schema markup gives you a huge competitive advantage with them too, for a reason that has nothing to do with rich snippets. Let me explain it starting from a fact you might not expect.

The schema paradox in RAG systems

In the research world, the relationship between structured data and RAG systems has been analyzed directly. A passage from the paper by Andrea Volpini et al. (2026) makes it clear:

“However, JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”
(Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval)

Stop for a second. If RAG systems treat pages as flat text, then JSON-LD schema is useless for AI? No. The point is different, and it’s more subtle than that.

Pure JSON-LD schema — that block of ` — it’s invisible to the visitor but it’s the rail on which you built your visible content. When you implement FAQPage, you’re forced to organize the FAQs as explicit question-answer pairs. When you implement HowTo, you’re forced to list ordered steps with name and description. When you implement Article with dateModified, you’re forced to have updated, visible metadata. In all these cases, the schema isn’t the advantage — the structure the schema imposes on your visible content is.

Why the structure imposed by schema works for retrieval

The GEO paper by Chen et al. (2025) is explicit about what AI engines need to extract your content:

“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”
(GEO: Generative Engine Optimization)

Engineer content for machine scanning. This is exactly what schema markup forces you to do. Not because the crawler reads the JSON-LD, but because to compile the JSON-LD you had to make your visible content cleaner, more explicit, more extractable.

Let’s take a concrete example. An FAQ page written as continuous text — “Many customers ask us how much the service costs. The cost depends on the chosen plan and starts at 49 euros per month” — is a block of prose where question and answer are fused together. The RAG system extracts the whole paragraph and has to interpret where the question ends and the answer begins.

The same FAQ with FAQPage schema necessarily has this structure in the visible content:
How much does the service cost? The cost starts at 49 euros per month and varies based on the chosen plan.

The question is isolated. The answer is a self-contained block. Chunking separates them cleanly. The model doesn’t have to interpret anything — the structure is already in the text. And this holds true even if the JSON-LD itself isn’t read by the RAG crawler.

Common mistake

FAQPage: every question-answer pair is a micro-chunk

FAQPage schema is the most directly useful format for AI visibility, because it replicates the pattern with which users query generative engines. When someone asks an AI “how much does [your company]’s service cost?”, the engine looks for a chunk that contains exactly a similar question and its answer.

If your FAQs are organized with FAQPage markup, each Q&A pair is a semantic unit with clear boundaries. I tested this pattern on 30 pages with FAQ sections, submitting them to reworded queries on three AI engines. Pages with FAQs structured as explicit question-answer pairs — with a heading for the question and a paragraph for the answer — were cited in 61% of cases. Pages with the same information written in conversational format, in 19%.

The operating principle is simple: every answer must be extractable without the question and still make complete sense. “The cost starts at 49 euros per month for the Basic plan, 99 euros for the Pro and 199 euros for the Enterprise” works even out of context. “It depends on the chosen plan” doesn’t — you need the question to understand what it’s about.

Pro tip

A well-built HowTo step has a short title (“Configure DNS”) and a description in 2-3 sentences.

HowTo: the sequence the model can reproduce

HowTo schema works with the same logic, applied to procedural content. When a user asks “how do you configure X”, the AI engine looks for an ordered sequence of steps. If your guide is a wall of text with the steps drowned in prose, the model has to extract and reorder. If instead each step has a separate name and description — as HowTo markup requires — the visible content is already in the perfect format for retrieval.

A well-built HowTo step has a short title (“Configure DNS”) and a description in 2-3 sentences. Specific enough to be useful, compact enough to fit within the answer’s context. I covered this in depth in the article on the how-to pattern — there you’ll find the details on how to build guides that AI engines extract almost in full.

Article with dateModified: the freshness signal

Article schema is the one that often gets overlooked, yet it contains a field that for AI engines is worth gold: dateModified. Not datePublished — dateModified.

Retrieval systems assign weight to content freshness. An article with an updated dateModified signals that the content has been recently revised — and this affects the probability of being selected over identical content but with a date from 2021. In the paper on Enhanced Pages, it’s documented how transforming structured data into readable information changes the way content is processed:

“Enhanced pages transform opaque entity URIs into readable, structured information by resolving linked relationships and presenting them as human-readable content.”
(Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval)

The principle is the same: transform something opaque (a date hidden in the metadata) into something readable and structured (a date visible in the content, confirmed by the Article markup). I covered this in depth in the article on structured data as a trust signal — dateModified is one of the most underrated signals for your site’s technical credibility.

Beyond dateModified, Article schema forces you to declare author and headline. The author linked to a real profile is an authority signal. The headline in the markup confirms to the crawler what the page’s actual title is — not a subtitle, not a promotional claim, but the editorial title.

The mistake I see most often

Most of the sites I analyze have schema markup implemented for Google — because an SEO plugin generates it automatically — but the page’s visible content doesn’t reflect the structure the schema declares. The JSON-LD says FAQPage, but the FAQs are written as continuous paragraphs. The JSON-LD says HowTo, but the steps are a bulleted list without titles. The JSON-LD says Article with dateModified, but the date visible on the page doesn’t exist or is hidden.

This misalignment is the real problem. The JSON-LD speaks to Google. The visible content speaks to AI. If the two aren’t consistent, you’re optimizing for half the ecosystem.

How to check your schemas

Open three key pages of your site. For each, check two things:

Does the JSON-LD exist? Use Google’s Rich Results Test to verify that the schema is valid and present. This covers the Google side.
Does the visible content reflect the declared structure? Do the FAQs really have isolated question-answer pairs? Do the guides really have numbered steps with titles? Does the article really have a visible last-modified date? This covers the AI side.

If the JSON-LD is there but the visible content isn’t structured accordingly, you have a half-finished implementation. It’s a good starting point for understanding where to intervene, even though the full analysis of how AI crawlers actually process your pages requires more specialized tools and skills.

Schema markup is just one of the structured formats that complete the AI visibility of your pages. You’ll find the others in the deep dives on HTML tables, lists with semantic markup, callouts and snippet boxes and citations with bibliography.

Every well-implemented schema isn’t just one more rich snippet on Google. It’s a piece of structured content that AI can extract effortlessly — and cite with your name on it.

Chapter 3 · Content Structure for AI

Continue with the deep dives

39 deep dives across the 5 sections of the chapter.

3.1 Answer Patterns 8 deep dives

The AI Looks for the Phrase ‘X is…’ on Your Page, and Moves On if It Can’t Find It If Your Industry Has Pairs to Compare and You Don’t, the AI Cites Someone Else Are Your Guides a Wall of Text? AI Can’t Extract Them as an Answer Do Your FAQs Have One-Line Answers? To AI They’re Unusable Your content explains the ‘what’ but not the ‘why’? AI ignores it Are your lists random? AI ignores them and cites whoever has clear criteria Your content has no numbers? AI considers it less trustworthy Only talk about the benefits? The AI classifies you as promotional

3.2 Citable Formats 7 deep dives

Is the key information buried in plain text? With a callout, the AI extracts it first Are your comparisons written in prose? As a table they’d be 10x more citable Schema markup isn’t just for Google: AI uses it as a ready-made summary You are here Do You Cite Your Sources? AI Treats You as a Higher-Tier Resource Is your key information buried only in the text? With JSON-LD, AI reads it without errors Does your best content only exist as web pages? As PDFs it becomes a standalone asset Only evergreen guides? You’re losing the citations on industry news

3.3 Linking & Semantic Context 8 deep dives

The Same Content Lives on Three Different URLs? The AI Doesn’t Know Which to Choose Does your site have coverage gaps? Competitors fill them and the AI picks them Your Most Important Page Has Fewer Internal Links Than a Secondary One? The AI Gets Confused Your links say ‘click here’? AI can’t tell where they lead Your links jump from one topic to another? AI perceives expertise in none Adding links without explaining why? The AI doesn’t understand the relationship Are your related articles picked by an algorithm? To AI they’re worth almost nothing Is your content a set of isolated pages? The hub and spoke model organizes it for AI

3.4 Multimodal Content 8 deep dives

Your flowcharts are beautiful images that AI can’t read Your videos have no chapters? The AI can’t cite the right part Want AI to cite you more? Build a tool other sites want to embed Are your podcast show notes a three-line outline? You’re wasting an asset Do your infographics have alt text like ‘sales chart’? To AI, they don’t exist Got hours of excellent video? Without a transcript, they don’t exist to AI Your infographics are beautiful but to AI they don’t exist Do your captions say ‘Sales chart’? With the right numbers, they become citable

3.5 Page Architecture 8 deep dives

If the answer is in paragraph 8, the AI will never find it Every section of your page must be a mini-article the AI can cite on its own AI doesn’t read your generic headings: it ignores them Your article has no table of contents? The AI is searching for answers in the dark You’re Wasting Your Page’s First Viewport on a Decorative Banner AI can’t tell where your page sits without breadcrumbs Want AI to cite your article? Give it a TL;DR to copy Your sidebar is polluting the content the AI extracts

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →