Authority and Credibility for AI

Structured data is your site’s ID card for AI

Roberto Serra 25 June 2026·~8 min read

If your site doesn't have a machine-readable digital ID card, AI has to guess who you are, what you do and where you operate — and when it gets it wrong, your potential customer receives incorrect information about you. Meanwhile, whoever made those same pieces of information immediately readable gets cited in your place. There are standard instructions — structured data — that tell AI exactly what you are, with no margin for error. You install them once and they work for you on every page of the site.

There’s a widespread belief among those who work with schema.org: you add the JSON-LD markup, Google shows the rich snippet, end of story. For years it worked that way. Structured data was a cosmetic bonus — the little star in the SERPs, the price in the result, the breadcrumb under the title.

But now the game is being played on a different field. The AI models that power the answers of ChatGPT, Perplexity and Gemini aren’t looking for rich snippets. They’re looking for verifiable, machine-readable information they can use to decide whether to trust a source. And here structured data comes into play in a way that, in my professional practice, I’ve seen very few people consider — and with a paradox I’ll explain to you starting from the data.

The JSON-LD paradox: useful, but not the way you think

Let’s start with the data point that breaks the mental model of anyone who works with markup. A 2026 paper by Volpini et al. analyzed the real impact of structured data on RAG systems — those that search for information in real time before generating an answer:

“JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”

Volpini et al., 2026

Stop for a second on this. JSON-LD, the format we use to implement schema.org, produces no measurable benefit in RAG systems that treat pages as flat text. And most RAG systems do exactly this: they take your page, convert it into text, and process it.

If you’re thinking “then structured data is useless for AI,” that conclusion is wrong. The same paper explains why. But first you need to understand what AI systems actually see when they visit your site.

What AI sees: text, not markup

The 2024 analysis by Gao et al., one of the most comprehensive references on RAG systems, clarifies a fundamental point about the data source:

“Unstructured Data, such as text, is the most widely used retrieval source.”

Gao et al., 2024

Unstructured text is the most widely used retrieval source. Not JSON-LD, not schema.org markup, not knowledge graphs. Text. Paragraphs, sentences, words in sequence.

This means that when a RAG system crawls your page to decide whether to use it as a source in an answer, in the vast majority of cases it’s reading the visible text. The JSON-LD block you inserted in the page’s head — the one with Organization, Person, your company data — is invisible to that system. It’s there, in the code, but it isn’t processed as a retrieval source.

Now the question becomes: if the RAG only reads the text, what is structured data for?

Common mistake

The mistake I see most often is exactly this: sites with a technically flawless schema.org, validated without errors, but with pages that don’t expose that information in readable text.

The turning point: materializing structured data in the text

Here comes the insight that changes the strategy. The same Volpini et al. paper tested a different approach: instead of leaving structured data buried in the JSON-LD, they materialized it into readable pages — they call them “enhanced entity pages.” Pages where the schema information (who you are, what you do, where you are, what relationships you have with other entities) is exposed as structured, readable text.

The result:

“Enhanced pages achieved +29.6% accuracy improvement for standard RAG.”

Volpini et al., 2026

A 29.6% gain in accuracy. Not visibility, not ranking — accuracy. The RAG system, when it finds a page that presents structured data in readable format, produces more correct answers. And a more correct answer is a more likely answer — systems tend to prefer sources that reduce uncertainty.

The difference between hidden JSON-LD and the enhanced page is exactly this: in the first case the information exists but AI doesn’t see it; in the second case the information is in the text, in the flow the system processes, and it becomes part of the answer.

Pro tip

The principle is simple: everything you put in the JSON-LD must also exist in the text.

The dual strategy: parsers and RAG

This doesn’t mean you have to abandon JSON-LD. It means you have to play on two tables at once.

The first table is that of traditional search engines. Google and Bing have dedicated parsers for JSON-LD. They read your Organization schema, your Person schema, your FAQ markup, and use them to feed the Knowledge Graph, show rich snippets, and validate your entity. For this channel, JSON-LD keeps working exactly as before — and if you already have a Knowledge Panel, it’s partly thanks to that.

The second table is that of generative AI. Here JSON-LD alone isn’t enough. You need pages that make explicit, in readable text, the information you normally hide in the markup. Who you are, what your company does, what your services are, who the key people are, what certifications you hold, where you operate.

In practice: if your Organization schema says ”name”: “Company X”, “foundingDate”: “2010”, “areaServed”: “Italy”, this information must also exist as visible text on your About page. Not because Google needs it — that reads it from the JSON-LD. But because generative AI reads the text.

What to implement: from markup to content

The strategy splits into two levels. The first is what you’re probably already doing (or should be): implementing schema.org in JSON-LD on the site’s key pages.

But the second level is the one that makes the difference for AI, and almost no one is doing it: creating pages that materialize that data into readable content.

Here’s what that means in practice. If you’ve implemented Organization schema with your company data, your About page must contain that same information in a discursive format — not a bare bulleted list, but text that a RAG system can extract and use as a source. If you have Person schema for your authors, every author must have a profile page that exposes skills, experience and affiliations in readable text. If you have FAQ schema, the questions and answers must be present on the page as visible content, not just in the markup.

The principle is simple: everything you put in the JSON-LD must also exist in the text. JSON-LD speaks to Google’s and Bing’s parsers. The text speaks to generative AI. If you only have the first, you’re speaking to half the audience.

The most common mistake: markup without content

The mistake I see most often is exactly this: sites with a technically flawless schema.org, validated without errors, but with pages that don’t expose that information in readable text.

They have the complete Organization JSON-LD in the head, but the About page is three generic lines. They have Person schema for every author, but the author pages are stubs with a name and a photo. They have FAQ markup, but the FAQs exist only in the code, not on the page.

For Google, it works. The parser reads the JSON-LD and ignores the text. For generative AI, it’s like having an ID card in a locked safe: it exists, but no one sees it.

A quick check you can do right now: take the three most important pages of your site. Look at the implemented JSON-LD. Then look at the page’s visible text. Does the information match? Does the text contain everything the markup declares? If there are gaps, those are the points where AI loses information about you. It’s a first step toward understanding where you stand — full implementation requires systematic work on page architecture and content structure.

Why this is a competitive advantage

The good news is that almost no one is doing this dual implementation. Most sites that have schema.org stopped at the first level — the technical markup. Very few have understood that the markup must become content.

This connects directly to what I discussed in the articles on the weight of implicit mentions and on backlinks as a citation signal: AI visibility is built on multiple levels at once. Structured data is one of those levels — but only if you make it visible to all systems, not just Google.

Whoever implements the dual strategy now — JSON-LD for the parsers, materialized content for AI — is building an advantage that consolidates over time. The more RAG systems become the dominant mode of search, the more that +29.6% of accuracy translates into concrete visibility. It’s mechanics, not prophecy: systems prefer sources that make information easy to extract.

Topical authority strengthens when your pages not only talk about your topic, but do so in a format AI can process without ambiguity. Structured data materialized in the text is exactly this: clear, verifiable information, ready to be extracted.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives

Yesterday’s Update Beats the Perfect Article from 2 Years Ago Structured data is your site’s ID card for AI You are here Backlinks aren’t just for Google: AI uses them in training to weight sources Even without a link, every mention of your brand counts for AI 50 articles on one topic beat 500 on everything: topical authority for AI Do You Have a Google Knowledge Panel? To AI, You Are a Recognized Entity When an expert in your field mentions you, the AI registers the signal Not all validations carry equal weight: the trust hierarchy for AI

2.2 Brand Authority 8 deep dives

Different names on different platforms? AI fragments your authority For Local Queries, AI Gives Huge Weight to Geographic Signals Reviews, followers, case studies: AI sums them all into a single score Repeat brand + category everywhere: the AI builds the association for you The CEO’s Authority Transfers to the Company (and Vice Versa): AI Sees It AI Has 3-5 Slots in Its Answers: How to Take a Competitor’s Place Your trade association membership is a signal for the AI Your site says ‘leader since 2005’, LinkedIn says ‘founded in 2012’: the AI notices

2.3 Sources & Citations 7 deep dives

Data Only You Have: The Ultimate Weapon for AI Visibility Wikipedia is the source every AI model checks first Can AI Tell a Real Expert From a Self-Proclaimed One Spontaneous user recommendations outweigh any content you create Academic papers, Wikipedia, media: the source hierarchy for AI Being cited on a .gov site is equivalent to a certification for AI A book with an ISBN is the format with the highest trust score for AI

2.4 Technical Credibility 8 deep dives

AI crawlers have more aggressive timeouts than Google: is your page fast enough? Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT Wrong semantic HTML = AI doesn’t understand your content’s hierarchy Your content’s update date is a signal the AI reads A Public API Endpoint Makes Your Business Integrable by AI Your site’s accessibility is a quality proxy for AI too Anonymous content with no source? For AI it’s a red flag Without HTTPS, Your Site Doesn’t Exist for RAG Systems

2.5 Trust & Reputation 9 deep dives

AI authority is not permanent: if you don’t maintain it, it decays 5 Stars on Google, 2 on Trustpilot: AI Sees the Contradiction AI Uses Google’s E-E-A-T Report Card to Decide Who to Trust Your site is excellent but AI doesn’t know you? It could be a training bias When All Experts Say the Same Thing, AI Presents It as Truth You’ve published on your topic for 10 years? The AI knows it and rewards you If AI recognizes your name as an expert, all your content rises Perplexity doesn’t cite everyone: it has a quality filter you must pass A Web Controversy Can Erase You From AI Answers for Months

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →