If your site doesn't have a machine-readable digital ID card, AI has to guess who you are, what you do and where you operate — and when it gets it wrong, your potential customer receives incorrect information about you. Meanwhile, whoever made those same pieces of information immediately readable gets cited in your place. There are standard instructions — structured data — that tell AI exactly what you are, with no margin for error. You install them once and they work for you on every page of the site.
There’s a widespread belief among those who work with schema.org: you add the JSON-LD markup, Google shows the rich snippet, end of story. For years it worked that way. Structured data was a cosmetic bonus — the little star in the SERPs, the price in the result, the breadcrumb under the title.
But now the game is being played on a different field. The AI models that power the answers of ChatGPT, Perplexity and Gemini aren’t looking for rich snippets. They’re looking for verifiable, machine-readable information they can use to decide whether to trust a source. And here structured data comes into play in a way that, in my professional practice, I’ve seen very few people consider — and with a paradox I’ll explain to you starting from the data.
The JSON-LD paradox: useful, but not the way you think
Let’s start with the data point that breaks the mental model of anyone who works with markup. A 2026 paper by Volpini et al. analyzed the real impact of structured data on RAG systems — those that search for information in real time before generating an answer:
“JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”
Stop for a second on this. JSON-LD, the format we use to implement schema.org, produces no measurable benefit in RAG systems that treat pages as flat text. And most RAG systems do exactly this: they take your page, convert it into text, and process it.
If you’re thinking “then structured data is useless for AI,” that conclusion is wrong. The same paper explains why. But first you need to understand what AI systems actually see when they visit your site.
What AI sees: text, not markup
The 2024 analysis by Gao et al., one of the most comprehensive references on RAG systems, clarifies a fundamental point about the data source:
“Unstructured Data, such as text, is the most widely used retrieval source.”
Unstructured text is the most widely used retrieval source. Not JSON-LD, not schema.org markup, not knowledge graphs. Text. Paragraphs, sentences, words in sequence.
This means that when a RAG system crawls your page to decide whether to use it as a source in an answer, in the vast majority of cases it’s reading the visible text. The JSON-LD block you inserted in the page’s head — the one with Organization, Person, your company data — is invisible to that system. It’s there, in the code, but it isn’t processed as a retrieval source.
Now the question becomes: if the RAG only reads the text, what is structured data for?
The mistake I see most often is exactly this: sites with a technically flawless schema.org, validated without errors, but with pages that don’t expose that information in readable text.
The turning point: materializing structured data in the text
Here comes the insight that changes the strategy. The same Volpini et al. paper tested a different approach: instead of leaving structured data buried in the JSON-LD, they materialized it into readable pages — they call them “enhanced entity pages.” Pages where the schema information (who you are, what you do, where you are, what relationships you have with other entities) is exposed as structured, readable text.
The result:
“Enhanced pages achieved +29.6% accuracy improvement for standard RAG.”
A 29.6% gain in accuracy. Not visibility, not ranking — accuracy. The RAG system, when it finds a page that presents structured data in readable format, produces more correct answers. And a more correct answer is a more likely answer — systems tend to prefer sources that reduce uncertainty.
The difference between hidden JSON-LD and the enhanced page is exactly this: in the first case the information exists but AI doesn’t see it; in the second case the information is in the text, in the flow the system processes, and it becomes part of the answer.
The principle is simple: everything you put in the JSON-LD must also exist in the text.
The dual strategy: parsers and RAG
This doesn’t mean you have to abandon JSON-LD. It means you have to play on two tables at once.
The first table is that of traditional search engines. Google and Bing have dedicated parsers for JSON-LD. They read your Organization schema, your Person schema, your FAQ markup, and use them to feed the Knowledge Graph, show rich snippets, and validate your entity. For this channel, JSON-LD keeps working exactly as before — and if you already have a Knowledge Panel, it’s partly thanks to that.
The second table is that of generative AI. Here JSON-LD alone isn’t enough. You need pages that make explicit, in readable text, the information you normally hide in the markup. Who you are, what your company does, what your services are, who the key people are, what certifications you hold, where you operate.
In practice: if your Organization schema says ”name”: “Company X”, “foundingDate”: “2010”, “areaServed”: “Italy”, this information must also exist as visible text on your About page. Not because Google needs it — that reads it from the JSON-LD. But because generative AI reads the text.
What to implement: from markup to content
The strategy splits into two levels. The first is what you’re probably already doing (or should be): implementing schema.org in JSON-LD on the site’s key pages.
But the second level is the one that makes the difference for AI, and almost no one is doing it: creating pages that materialize that data into readable content.
Here’s what that means in practice. If you’ve implemented Organization schema with your company data, your About page must contain that same information in a discursive format — not a bare bulleted list, but text that a RAG system can extract and use as a source. If you have Person schema for your authors, every author must have a profile page that exposes skills, experience and affiliations in readable text. If you have FAQ schema, the questions and answers must be present on the page as visible content, not just in the markup.
The principle is simple: everything you put in the JSON-LD must also exist in the text. JSON-LD speaks to Google’s and Bing’s parsers. The text speaks to generative AI. If you only have the first, you’re speaking to half the audience.
The most common mistake: markup without content
The mistake I see most often is exactly this: sites with a technically flawless schema.org, validated without errors, but with pages that don’t expose that information in readable text.
They have the complete Organization JSON-LD in the head, but the About page is three generic lines. They have Person schema for every author, but the author pages are stubs with a name and a photo. They have FAQ markup, but the FAQs exist only in the code, not on the page.
For Google, it works. The parser reads the JSON-LD and ignores the text. For generative AI, it’s like having an ID card in a locked safe: it exists, but no one sees it.
A quick check you can do right now: take the three most important pages of your site. Look at the implemented JSON-LD. Then look at the page’s visible text. Does the information match? Does the text contain everything the markup declares? If there are gaps, those are the points where AI loses information about you. It’s a first step toward understanding where you stand — full implementation requires systematic work on page architecture and content structure.
Why this is a competitive advantage
The good news is that almost no one is doing this dual implementation. Most sites that have schema.org stopped at the first level — the technical markup. Very few have understood that the markup must become content.
This connects directly to what I discussed in the articles on the weight of implicit mentions and on backlinks as a citation signal: AI visibility is built on multiple levels at once. Structured data is one of those levels — but only if you make it visible to all systems, not just Google.
Whoever implements the dual strategy now — JSON-LD for the parsers, materialized content for AI — is building an advantage that consolidates over time. The more RAG systems become the dominant mode of search, the more that +29.6% of accuracy translates into concrete visibility. It’s mechanics, not prophecy: systems prefer sources that make information easy to extract.
Topical authority strengthens when your pages not only talk about your topic, but do so in a format AI can process without ambiguity. Structured data materialized in the text is exactly this: clear, verifiable information, ready to be extracted.