Content Structure for AI

AI can’t tell where your page sits without breadcrumbs

The AI doesn't know whether your page is a beginner's introduction or a deep dive for experts: without a technical signal indicating where each page sits in the site's structure, it classifies everything at random. A technical guide treated as generic content never gets cited in the right questions, and the people looking for exactly what you offer can't find you. Adding that signal is quick — and that's all it takes to make the AI understand where to place your content and when to cite it.

There’s a question AI systems ask themselves every time they process one of your pages, one you’ve probably never asked yourself: is this page a general guide or a specific deep dive? Does it cover an entire industry or a single product? Does it sit at the top of the site’s hierarchy, or is it third-level content specialized on a precise niche?

Without breadcrumbs, the model has no way of knowing. It sees the content, it sees the title, but it doesn’t see the position. And position matters, because it determines the level of specificity the AI assigns to the page when it decides which result to return.

Position in the hierarchy isn’t a cosmetic detail

When I talk about breadcrumbs, I’m not talking about the navigation bar you see under the menu — that’s the visual representation. I’m talking about the structural signal that communicates to AI systems the logical path from the homepage to the current page: Home > Category > Subcategory > Specific page.

This path tells the model something fundamental: the thematic context of the page. A page on “trail running shoes for overpronators” placed under Home > Sport > Running > Trail > Overpronators carries a completely different specificity signal than the same page orphaned, with no hierarchical path at all. In the first case, the AI understands it’s dealing with highly specialized content. In the second, it has to guess.

In the research world, this principle is documented with precision. Volpini et al. in 2026 analyzed how structured pages communicate their content to AI systems:

“Enhanced pages transform opaque entity URIs into readable, structured information.”

Volpini et al., 2026

Turning opaque URIs into readable, structured information — that’s exactly what breadcrumbs do. They take a URL like /cat/sub/page-123, which to the AI is a meaningless string, and translate it into an explicit hierarchical path: Sport > Running > Trail > Overpronator shoes. The model no longer has to infer the position from the content — it reads it directly from the structure.

Why BreadcrumbList markup changes things

The breadcrumbs visible on the page are already a good start, but the real leap happens when you add BreadcrumbList schema markup. This markup translates the hierarchy into a format that AI engine parsers can process without ambiguity: every level has a name, a URL, and an explicit numeric position.

If you’ve read my article on heading hierarchy, you already know that AI systems use structural signals to understand how content is organized. Breadcrumbs complete that picture: headings tell the AI what the page is about, breadcrumbs tell it where that page sits relative to everything else on the site.

The literature confirms it. Gao et al. in 2024, in their survey on RAG systems, highlighted the role of structured data as a reference for retrieval quality:

“Structured data, such as knowledge graphs (KGs), serve as important references.”

Gao et al., 2024

Breadcrumbs with BreadcrumbList markup are, to all intents and purposes, a micro knowledge graph of your site. Every level is a node, every arrow is a hierarchical relationship: “this page belongs to this subcategory, which belongs to this category, which belongs to the site”. It’s a simple graph, but it’s a graph the AI can read and use as a reference for classifying the content.

Common mistake

The human reader sees the path, but the AI crawler sees only generic text on the page — it doesn’t interpret it as a hierarchical signal because it lacks the markup that qualifies it as such.

The real problem: pages without hierarchical context

In practice, what I see on most of the sites I analyze is one of these three scenarios. The first: breadcrumbs entirely absent. No visible path, no markup. The page exists in isolation, like a loose sheet without a folder.

The second: breadcrumbs visible but without structured markup. The human reader sees the path, but the AI crawler sees only generic text on the page — it doesn’t interpret it as a hierarchical signal because it lacks the markup that qualifies it as such.

The third is the most insidious: breadcrumbs with markup, but a flat or inconsistent hierarchy. Home > Page. Two levels. Or breadcrumbs that show a path different from the site’s actual structure, because the CMS generates them automatically from categories assigned at random. In both cases, the hierarchical signal is present but says the wrong thing — and a wrong signal is worse than no signal.

The point is that the goal of AI indexing isn’t just to find your content, but to understand what to do with it. The same survey by Gao et al. says it clearly:

“The goal of optimizing indexing is to enhance the quality of the content being indexed.”

Gao et al., 2024

Improving the quality of indexed content doesn’t just mean writing better. It means giving the system all the metadata it needs so that content gets classified, contextualized, and returned the right way. Breadcrumbs are one of those pieces of metadata — perhaps the most overlooked.

Pro tip

Implement BreadcrumbList markup on every page of your site, with a hierarchy that reflects your main themes and the levels of depth.

How to build breadcrumbs that work for the AI

The structure you need to implement is this: every page on the site must have BreadcrumbList markup in JSON-LD format (or microdata, but JSON-LD is cleaner) that reflects the real hierarchy of your content. Not the menu hierarchy. Not the URL hierarchy. The thematic hierarchy.

The difference matters. If your site sells shoes, the thematic hierarchy might be: Shoes > Running > Trail > Overpronators. That’s the content taxonomy. The URL might be /products/shoe-xyz — flat, with no hierarchical information. Breadcrumbs fill that gap.

Every level of the breadcrumb must have three properties: the readable name (e.g. “Trail Running”), the URL of the corresponding category page, and the numeric position in the path. The system uses the position to gauge depth: level 1 is the root, level 4 is highly specific content.

I also covered this in the article on chunk-friendly structure: every structural signal that helps the system segment and classify the content improves the chance that the extracted chunk is the right one. Breadcrumbs don’t change the content of the chunk — they change the context in which that chunk is interpreted.

What you can check right now

Open one of your main pages and check three things. First: is there BreadcrumbList markup in the source code? You can verify it with Google’s Rich Results Test — enter the URL and look for “Breadcrumb” among the results. Second: does the hierarchy reflect the real taxonomy of your content, with at least three levels of depth? Third: does every level of the breadcrumb point to a category page that exists and contains relevant content?

If even one of these checks fails, you’re losing a context signal the AI could use to classify your pages better. It’s a first step toward understanding where you stand — but building a coherent taxonomy that’s reflected in breadcrumbs, URLs, headings, and content requires an information architecture effort that goes beyond a single tag.

The chain of structural signals

Breadcrumbs don’t work alone. They’re one piece of a chain of structural signals that, together, tell the AI exactly what your site contains and how it’s organized. The inverted pyramid tells the model what’s most important on the page. The heading hierarchy tells it how the content is organized within. The table of contents offers a navigable map. Breadcrumbs add the final piece: where that page sits in the site’s overall architecture.

When all these signals are consistent, the AI system has a complete picture. It doesn’t have to guess anything: it knows what the page contains, how it’s organized, and where it sits in the hierarchy. And content the system understands thoroughly is content far more likely to be extracted and cited in answers.

Implement BreadcrumbList markup on every page of your site, with a hierarchy that reflects your main themes and the levels of depth. It’s the simplest signal to add — and one of the most effective for telling the AI exactly where your content sits.

Chapter 3 · Content Structure for AI

Continue with the deep dives

39 deep dives across the 5 sections of the chapter.

3.1 Answer Patterns 8 deep dives
3.2 Citable Formats 7 deep dives
3.3 Linking & Semantic Context 8 deep dives
3.4 Multimodal Content 8 deep dives
3.5 Page Architecture 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand