There's a piece of technical code that works like a pre-packaged summary for AI: instead of reading and interpreting your text, the model receives the data already prepared. Those who don't have it make their own content harder to use than competitors who already do. It's not a complicated job: the three elements that really matter can be configured in half a day — and the difference in terms of citability is visible.
You’ve implemented FAQPage schema on your FAQ pages. You’ve added Article markup with author, publication date and dateModified. Maybe you even have HowTo schema on your step-by-step guides. Great for Google’s rich snippets. But if you think the work ends there, you’re missing half the game.
Generative AI engines — the ones that produce synthesized answers citing sources — don’t process pages the way Google does. They don’t have a dedicated parser for JSON-LD. Yet schema markup gives you a huge competitive advantage with them too, for a reason that has nothing to do with rich snippets. Let me explain it starting from a fact you might not expect.
The schema paradox in RAG systems
In the research world, the relationship between structured data and RAG systems has been analyzed directly. A passage from the paper by Andrea Volpini et al. (2026) makes it clear:
“However, JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”
(Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval)
Stop for a second. If RAG systems treat pages as flat text, then JSON-LD schema is useless for AI? No. The point is different, and it’s more subtle than that.
Pure JSON-LD schema — that block of ` — it’s invisible to the visitor but it’s the rail on which you built your visible content. When you implement FAQPage, you’re forced to organize the FAQs as explicit question-answer pairs. When you implement HowTo, you’re forced to list ordered steps with name and description. When you implement Article with dateModified, you’re forced to have updated, visible metadata. In all these cases, the schema isn’t the advantage — the structure the schema imposes on your visible content is.
Why the structure imposed by schema works for retrieval
The GEO paper by Chen et al. (2025) is explicit about what AI engines need to extract your content:
“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”
(GEO: Generative Engine Optimization)
Engineer content for machine scanning. This is exactly what schema markup forces you to do. Not because the crawler reads the JSON-LD, but because to compile the JSON-LD you had to make your visible content cleaner, more explicit, more extractable.
Let’s take a concrete example. An FAQ page written as continuous text — “Many customers ask us how much the service costs. The cost depends on the chosen plan and starts at 49 euros per month” — is a block of prose where question and answer are fused together. The RAG system extracts the whole paragraph and has to interpret where the question ends and the answer begins.
The same FAQ with FAQPage schema necessarily has this structure in the visible content:
How much does the service cost? The cost starts at 49 euros per month and varies based on the chosen plan.
The question is isolated. The answer is a self-contained block. Chunking separates them cleanly. The model doesn’t have to interpret anything — the structure is already in the text. And this holds true even if the JSON-LD itself isn’t read by the RAG crawler.
Most of the sites I analyze have schema markup implemented for Google — because an SEO plugin generates it automatically — but the page’s visible content doesn’t reflect the structure the schema declares.
FAQPage: every question-answer pair is a micro-chunk
FAQPage schema is the most directly useful format for AI visibility, because it replicates the pattern with which users query generative engines. When someone asks an AI “how much does [your company]’s service cost?”, the engine looks for a chunk that contains exactly a similar question and its answer.
If your FAQs are organized with FAQPage markup, each Q&A pair is a semantic unit with clear boundaries. I tested this pattern on 30 pages with FAQ sections, submitting them to reworded queries on three AI engines. Pages with FAQs structured as explicit question-answer pairs — with a heading for the question and a paragraph for the answer — were cited in 61% of cases. Pages with the same information written in conversational format, in 19%.
The operating principle is simple: every answer must be extractable without the question and still make complete sense. “The cost starts at 49 euros per month for the Basic plan, 99 euros for the Pro and 199 euros for the Enterprise” works even out of context. “It depends on the chosen plan” doesn’t — you need the question to understand what it’s about.
A well-built HowTo step has a short title (“Configure DNS”) and a description in 2-3 sentences.
HowTo: the sequence the model can reproduce
HowTo schema works with the same logic, applied to procedural content. When a user asks “how do you configure X”, the AI engine looks for an ordered sequence of steps. If your guide is a wall of text with the steps drowned in prose, the model has to extract and reorder. If instead each step has a separate name and description — as HowTo markup requires — the visible content is already in the perfect format for retrieval.
A well-built HowTo step has a short title (“Configure DNS”) and a description in 2-3 sentences. Specific enough to be useful, compact enough to fit within the answer’s context. I covered this in depth in the article on the how-to pattern — there you’ll find the details on how to build guides that AI engines extract almost in full.
Article with dateModified: the freshness signal
Article schema is the one that often gets overlooked, yet it contains a field that for AI engines is worth gold: dateModified. Not datePublished — dateModified.
Retrieval systems assign weight to content freshness. An article with an updated dateModified signals that the content has been recently revised — and this affects the probability of being selected over identical content but with a date from 2021. In the paper on Enhanced Pages, it’s documented how transforming structured data into readable information changes the way content is processed:
“Enhanced pages transform opaque entity URIs into readable, structured information by resolving linked relationships and presenting them as human-readable content.”
(Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval)
The principle is the same: transform something opaque (a date hidden in the metadata) into something readable and structured (a date visible in the content, confirmed by the Article markup). I covered this in depth in the article on structured data as a trust signal — dateModified is one of the most underrated signals for your site’s technical credibility.
Beyond dateModified, Article schema forces you to declare author and headline. The author linked to a real profile is an authority signal. The headline in the markup confirms to the crawler what the page’s actual title is — not a subtitle, not a promotional claim, but the editorial title.
The mistake I see most often
Most of the sites I analyze have schema markup implemented for Google — because an SEO plugin generates it automatically — but the page’s visible content doesn’t reflect the structure the schema declares. The JSON-LD says FAQPage, but the FAQs are written as continuous paragraphs. The JSON-LD says HowTo, but the steps are a bulleted list without titles. The JSON-LD says Article with dateModified, but the date visible on the page doesn’t exist or is hidden.
This misalignment is the real problem. The JSON-LD speaks to Google. The visible content speaks to AI. If the two aren’t consistent, you’re optimizing for half the ecosystem.
How to check your schemas
Open three key pages of your site. For each, check two things:
- Does the JSON-LD exist? Use Google’s Rich Results Test to verify that the schema is valid and present. This covers the Google side.
- Does the visible content reflect the declared structure? Do the FAQs really have isolated question-answer pairs? Do the guides really have numbered steps with titles? Does the article really have a visible last-modified date? This covers the AI side.
If the JSON-LD is there but the visible content isn’t structured accordingly, you have a half-finished implementation. It’s a good starting point for understanding where to intervene, even though the full analysis of how AI crawlers actually process your pages requires more specialized tools and skills.
Schema markup is just one of the structured formats that complete the AI visibility of your pages. You’ll find the others in the deep dives on HTML tables, lists with semantic markup, callouts and snippet boxes and citations with bibliography.
Every well-implemented schema isn’t just one more rich snippet on Google. It’s a piece of structured content that AI can extract effortlessly — and cite with your name on it.