Authority and Credibility for AI

Wrong semantic HTML = AI doesn’t understand your content’s hierarchy

Roberto Serra 25 June 2026·~7 min read

Your site looks great visually, but behind the scenes the code is a mess — headings used at random, tables built in a non-standard way, a structure with no logic. For a human visitor nothing changes, but for AI it's pure noise: it can't figure out what matters and what doesn't, and in the end it decides to use another source. It's not a Google ranking problem — it's a comprehension problem: the AI loses key information about you because it can't read you precisely. Fixing the code takes less than a morning, and the results in terms of AI visibility are immediate.

Do you know the first test I run when I analyze a site that doesn’t show up in AI answers? I don’t look at the content. I don’t look at the backlinks. I open the source code and look at the HTML structure. And in most cases, I find the same problem: headings that jump from the main title straight to a third-level subheading, sections with no landmarks, content blocks floating in the markup without any hierarchical relationship between them.

The thing is, to a human reader the page looks perfect. The design is clean, the fonts are right, the text flows well. But AI doesn’t see the design. It sees the code. And if the code doesn’t have a coherent semantic structure, the content loses its hierarchy — and content without hierarchy is content that AI has a harder time processing, breaking into useful chunks, and returning as an answer.

The markup AI actually reads

When I talk about semantic markup I’m not talking about code aesthetics or best practices for fussy developers. I’m talking about the way a RAG system — the kind that powers the answers from ChatGPT, Perplexity, Gemini — interprets the structure of your page to decide what to extract.

RAG systems convert pages into text and then break them into chunks. But they don’t cut at random: they use the document’s structural signals to understand where one concept ends and another begins. Section headings are the strongest signals. A correct hierarchical heading structure — main title, subsections, sub-subsections — creates a map that the system uses to isolate self-contained blocks of information.

Volpini et al. in 2026 precisely defined the advantage of pages with rich semantic structure:

“Enhanced pages transform opaque entity URIs into readable, structured information by resolving linked relationships and presenting them as human-readable content.”

Volpini et al., 2026

“Readable, structured information” — that’s the key. Pages that turn opaque information into readable, structured content are the ones AI systems can process with less ambiguity. And HTML semantic structure is the first layer of this transformation: without it, the content is flat, undifferentiated text, with no anchor points for retrieval.

Why JSON-LD isn’t enough

If you’ve read my article on structured data, you already know that JSON-LD has a paradox: it works for Google and Bing parsers, but it produces no measurable benefits in RAG systems. The same paper by Volpini et al. says so explicitly:

“JSON-LD markup remains valuable for search engines with dedicated parsers (Google, Bing), but it provides no measurable benefit in RAG-based systems that treat pages as flat text.”

Volpini et al., 2026

That’s why HTML semantic markup becomes essential. JSON-LD lives in the page head, invisible to the text the RAG processes. Semantic markup, on the other hand, lives inside the text: it’s the headings that give hierarchy, the <header>, <nav>, <main>, <footer> tags that define the logical boundaries of the content. When the system converts your page into flat text, these structural signals guide the segmentation.

The difference between a page with semantic markup and one without is the difference between a book with a table of contents and chapters and a wall of text with no breaks. Both contain the same words. But one is navigable, the other isn’t.

Common mistake

A site built entirely with generic divs is like a building with no signs: the rooms exist, but no one knows which is the entrance, which is the living room, which is the closet.

The data point that changes the perspective

When Volpini et al. compared pages with rich semantic structure (the “enhanced pages”) against those with JSON-LD only, the result was clear-cut:

“Enhanced pages exposed 2.4x more discoverable links than JSON-LD pages (102.2 vs. 41.9).”

Volpini et al., 2026

2.4 times more discoverable links. This doesn’t just mean “more links on the page” — it means the system manages to discover and follow 2.4 times more connections when the HTML structure is semantically rich. The relationships between entities, the links between concepts, the cross-references become accessible because the structure makes them explicit.

In practical terms: if your page has correct hierarchical headings, landmarks that delimit the sections, aria attributes where they’re needed to clarify the role of the components, the AI system manages to extract more useful information from the same amount of content. Not because the content is different — because the structure makes it readable.

Pro tip

You can use the HeadingsMap browser extension to view the page’s heading tree in a second — it instantly shows you whether there are jumps or inconsistencies in the hierarchy.

The mistakes I see most often

After analyzing hundreds of sites, the wrong patterns repeat themselves. The first is the heading that skips levels: from the main title you jump straight to a third-level heading because “visually the font was too big”. The problem is that the choice of heading shouldn’t depend on the design — that’s what CSS is for. The heading defines the logical hierarchy of the document, and if it skips a level, the system loses a step in the structure.

The second mistake is using <div> for everything. A site built entirely with generic divs is like a building with no signs: the rooms exist, but no one knows which is the entrance, which is the living room, which is the closet. Semantic tags — <header>, <nav>, <main>, <article>, <footer> — are those signs. They tell the system what each block contains before it even reads it.

The third is the most insidious: headings used for decorative purposes. Section titles inserted as headings just because the CMS formats them a certain way, with no relationship to the content’s hierarchy. Every out-of-place heading is a false signal that confuses the segmentation.

And then there’s a fourth pattern that isn’t strictly a technical error, but causes the same damage: pages where the main content is drowned among sidebars, widgets, banners and repeated blocks. If the <main> tag contains more noise than signal, the chunk the system extracts will be diluted. The ratio between useful content and accessory markup matters — and a <main> landmark that wraps only the relevant content helps the system isolate what’s worth processing.

What you can check right away

Open the source code of your main pages and check three things. First: is there a single main title per page? Second: do the section titles follow a hierarchical order with no jumps? Third: is the main content wrapped in a <main> tag or at least in a tag with an explicit role?

If even one of these checks fails, the AI system is working harder than necessary to understand the structure of your content. You can use the HeadingsMap browser extension to view the page’s heading tree in a second — it instantly shows you whether there are jumps or inconsistencies in the hierarchy.

This is a first step to spotting the surface-level problems. But semantic structure goes beyond headings: landmarks, aria attributes, template organization, the ratio between content and accessory markup — these are interventions that require specific technical expertise and an overall vision of how the site communicates with AI crawlers.

The thread that holds it all together

Semantic markup isn’t an isolated aspect. It connects to everything I’ve talked about in the articles on crawlability — because if the crawler reaches your page but finds a flat structure, the content it extracts will be less usable. It connects to page experience because the technical quality signals add up. It connects to HTTPS because technical trust is a package, not a single factor.

And if you think structured data in JSON-LD solves the problem, I invite you to reread what I wrote about the dual strategy: JSON-LD speaks to the parsers, semantic markup speaks to the AI. You need both. But if you have to choose where to start, start from the HTML structure — because that’s what the RAG system reads first.

The next step is to understand how content freshness fits into all of this: a perfect semantic structure on outdated content is still a problem. But one by one, these optimizations build a technical profile that AI engines recognize as reliable — and from which they prefer to extract their answers.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives

Yesterday’s Update Beats the Perfect Article from 2 Years Ago Structured data is your site’s ID card for AI Backlinks aren’t just for Google: AI uses them in training to weight sources Even without a link, every mention of your brand counts for AI 50 articles on one topic beat 500 on everything: topical authority for AI Do You Have a Google Knowledge Panel? To AI, You Are a Recognized Entity When an expert in your field mentions you, the AI registers the signal Not all validations carry equal weight: the trust hierarchy for AI

2.2 Brand Authority 8 deep dives

Different names on different platforms? AI fragments your authority For Local Queries, AI Gives Huge Weight to Geographic Signals Reviews, followers, case studies: AI sums them all into a single score Repeat brand + category everywhere: the AI builds the association for you The CEO’s Authority Transfers to the Company (and Vice Versa): AI Sees It AI Has 3-5 Slots in Its Answers: How to Take a Competitor’s Place Your trade association membership is a signal for the AI Your site says ‘leader since 2005’, LinkedIn says ‘founded in 2012’: the AI notices

2.3 Sources & Citations 7 deep dives

Data Only You Have: The Ultimate Weapon for AI Visibility Wikipedia is the source every AI model checks first Can AI Tell a Real Expert From a Self-Proclaimed One Spontaneous user recommendations outweigh any content you create Academic papers, Wikipedia, media: the source hierarchy for AI Being cited on a .gov site is equivalent to a certification for AI A book with an ISBN is the format with the highest trust score for AI

2.4 Technical Credibility 8 deep dives

AI crawlers have more aggressive timeouts than Google: is your page fast enough? Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT Wrong semantic HTML = AI doesn’t understand your content’s hierarchy You are here Your content’s update date is a signal the AI reads A Public API Endpoint Makes Your Business Integrable by AI Your site’s accessibility is a quality proxy for AI too Anonymous content with no source? For AI it’s a red flag Without HTTPS, Your Site Doesn’t Exist for RAG Systems

2.5 Trust & Reputation 9 deep dives

AI authority is not permanent: if you don’t maintain it, it decays 5 Stars on Google, 2 on Trustpilot: AI Sees the Contradiction AI Uses Google’s E-E-A-T Report Card to Decide Who to Trust Your site is excellent but AI doesn’t know you? It could be a training bias When All Experts Say the Same Thing, AI Presents It as Truth You’ve published on your topic for 10 years? The AI knows it and rewards you If AI recognizes your name as an expert, all your content rises Perplexity doesn’t cite everyone: it has a quality filter you must pass A Web Controversy Can Erase You From AI Answers for Months

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →