Content Structure for AI

Is your key information buried only in the text? With JSON-LD, AI reads it without errors

Roberto Serra 25 June 2026·~8 min read

Are your prices, FAQs, and catalog buried in the page text? The AI has to extract them by reading, and it often gets them wrong or ignores them entirely — which means customers receive incorrect or incomplete information about you. There is a technical format that delivers that data to the AI ready to use, with no margin for error. Those who adopt it get cited accurately, gain credibility, and turn every AI answer into a reliable acquisition channel.

You have a product page with price, availability, average rating, technical specs. All scattered across paragraphs, tables, and sidebars. A human visitor finds them because they have eyes and context. An AI system that processes that page as flat text first has to figure out where the data is, then extract the value, then verify that it is up to date. Three steps, three chances for error.

JSON-LD eliminates those three steps. It is a block of code invisible to the visitor, perfectly readable by any parser, where every piece of data is labeled, typed, and associated with a precise entity. Price: 149 euros. Currency: EUR. Availability: InStock. Zero ambiguity, zero interpretation.

How JSON-LD works and why it differs from text

JSON-LD stands for JavaScript Object Notation for Linked Data. In practice, it is a ` inserted in the head or body of the page.

The difference from text is substantial. When you write “Our consulting service costs 1,500 euros per day,” the data is drowned in the syntax. A parser has to perform natural language analysis to extract the number and associate it with the service mentioned two lines above. With JSON-LD, that same information becomes:

“json { “@type”: “Service”, “name”: “AI strategic consulting”, “offers”: { “price”: “1500”, “priceCurrency”: “EUR” } } “

The data is already in the format the system consumes. And this does not apply only to Google and Bing — it applies to any pipeline that can read structured data.

The critical point: RAG systems do not read JSON-LD

Here I have to be honest, because there is a lot of confusion on this topic. JSON-LD is extremely powerful for traditional search engines and for knowledge graphs. But most of the RAG systems that power generative AI engines do not process it.

The study by Volpini et al. (2026) on linked structured data as a memory layer for retrieval documents this explicitly:

“However, JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”
(Structured Linked Data as a Memory Layer for Agent-Orchestrated Retrieval)

Stop and consider this passage. It does not say JSON-LD is useless — it says it produces no measurable benefit in RAG systems that treat pages as flat text. And this is a fundamental distinction, because it tells you exactly where JSON-LD works and where it does not, and consequently how you should use it.

Traditional search engines — Google, Bing — have dedicated parsers that read the JSON-LD block, interpret it, and use it for rich snippets, the Knowledge Panel, and direct answers. When Google shows your product price directly in the search results, it is reading your JSON-LD. And when AI engines draw on Google’s results to build their answers — which happens regularly — that structured data reaches them indirectly as well.

Common mistake

A field filled with made-up data is worse than an empty field, because verification systems cross-reference information and penalize inconsistencies.

The dual strategy: JSON-LD plus structured text

From this evidence follows a precise operational approach. You do not have to choose between JSON-LD and well-structured text. You have to use both, each for what it does best.

JSON-LD for dedicated parsers. Google, Bing, and the systems that read structured data receive your information in a machine-parsable format, without ambiguity. This feeds knowledge graphs, rich snippets, and — indirectly — the AI answers that draw on these sources.

Structured text for RAG systems. The crawlers that process the page as flat text — and they are the majority of generative AI pipelines — need the same information to be present in the body of the page in a readable format. I covered this in depth in the article on HTML tables as a structured chunk: a table with clear headers and text values is the most effective format for making data extractable by retrieval.

In the survey by Gao et al. (2024) on RAG systems, the distinction between data sources is clear:

“Structured data, such as knowledge graphs (KGs), which are typically verified and can provide more precise information.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

Structured data is typically verified and provides more precise information. JSON-LD feeds exactly those knowledge graphs. Every time you declare a Product, Service, or LocalBusiness entity with the correct fields, you are contributing to the structured knowledge base that AI engines draw on to give precise answers.

Pro tip

For each of these schemas, the rule is one: fill in every field for which you have real data.

Which schemas to implement and with which fields

Not all JSON-LD schemas have the same impact. Three types cover the vast majority of use cases for a company that wants to be visible in AI answers.

Product. If you sell products, every product page should have a Product schema with: name, description, brand, offers (with price, priceCurrency, availability), aggregateRating (if you have reviews), image, sku. These are the fields Google reads for rich snippets and that feed answers to queries like “how much does X cost” or “is X available?”.

Service. If you offer services, the Service schema lets you declare: name, description, provider (your company), areaServed, serviceType. Add offers if you have defined pricing. This schema is less common than Product, which means less competition — and a higher chance that your data surfaces when someone asks “who offers service Y in area Z”.

LocalBusiness. For any business with a physical location or a defined service area: name, address, telephone, openingHours, geo (coordinates), sameAs (links to social profiles and directories). This schema communicates directly with Google Business Profile and with local knowledge graphs — a goldmine for anyone who wants to appear in answers to geolocated queries.

For each of these schemas, the rule is one: fill in every field for which you have real data. A field filled with made-up data is worse than an empty field, because verification systems cross-reference information and penalize inconsistencies.

The mistake I see most often

The mistake is not “I don’t have JSON-LD.” The mistake is “I have JSON-LD but it is incomplete or disconnected from the page content.”

I have analyzed dozens of Italian company websites. The typical situation: an Organization schema in the site footer, identical on every page, with name and logo. That’s it. No Product schema on product pages. No Service schema on service pages. Sometimes a FAQ schema implemented by the SEO plugin but with generic questions that do not correspond to any real content on the page.

The second mistake is the mismatch between JSON-LD and text. If your JSON-LD declares a price of 149 euros but the visible price on the page is 159 euros (maybe because you updated one and not the other), the verification system detects the inconsistency. The result: the data is not used because it is not reliable.

The same survey reiterates it: unstructured text remains the most widely used retrieval source:

“Unstructured Data, such as text, is the most widely used retrieval source, which are mainly gathered from corpus.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

If the page text is the primary source for RAG systems, then JSON-LD alone is not enough. But if you add JSON-LD to structured text, you cover both channels: dedicated parsers and text-based retrieval. And whoever covers both channels has an advantage over whoever covers only one.

A starting check on your pages

Open a key page on your site — the one for your flagship product or main service. Go to Google Rich Results Test and paste the URL. If the tool detects no schema, the first step is to implement one. If it detects a generic schema (only Organization), the step is to add the specific schemas for the content type.

Then compare the data in the JSON-LD with what is visible on the page. Price, availability, product name, rating: they must match. A discrepancy is a problem you can fix in ten minutes and that could be blocking the citation of your data.

It is a surface-level check, of course. To understand how AI crawlers are processing your pages and whether your structured data makes it all the way to the answers, you need deeper tools. But this check already tells you whether you are in the game or sitting in the stands.

If you want to dig into the other formats that make your content citable by AI pipelines, in the article on the FAQ, HowTo, and Article schemas I explain how these schemas communicate directly with AI engines. In the one on in-content citations and bibliography you will find the format that AI recognizes and reproduces almost verbatim. And if you are thinking about how to turn your best content into standalone assets, I wrote a deep dive on downloadable content as an authority signal.

Every piece of data that today is buried in the text without a machine-readable label is visibility you are handing over to whoever did add that label. JSON-LD plus structured text is the combination that covers all channels — today’s and tomorrow’s.