Content Structure for AI

Are your comparisons written in prose? As a table they’d be 10x more citable

Roberto Serra 25 June 2026·~7 min read

Have you made comparisons between products or services in a text paragraph and can't figure out why the AI never uses them? The exact same content in table format instantly becomes more citable: the model reads it as structured data instead of having to interpret it. You're losing citations not because of what you write, but because of how you present it. The conversion takes an afternoon — and it's worth far more than the effort.

You have a page with a comparison between three pricing plans, or between the specs of four products. You wrote it in prose: “The Basic plan costs 29 euros and includes 5 users, while the Pro plan costs 59 euros and includes 15 users, and the Enterprise plan…” One continuous stream of text, maybe with some bold here and there.

Now ask yourself: if an AI engine has to answer the question “what are the pricing plans of [your company]?”, what would it rather extract? A 200-word paragraph where the data is drowned in syntax, or a table where each cell contains an isolated, labeled, comparable data point?

Why tables are a different format for AI

When a RAG system processes your page, it doesn’t “see” the layout the way you do. It extracts text. But there’s a crucial difference in how it extracts a prose paragraph versus an HTML table with <tr> and <td>.

An HTML table is structured by definition. Each row is a record, each column is an attribute, each cell is a value. The model doesn’t have to interpret the relationship between the data — the relationship is encoded in the markup. “Basic plan” sits in the same row as “29 euros” and “5 users”: the connection is explicit. It works, but with a higher margin of error and a higher computational cost.

In the research world, this distinction is clearly documented. In the survey by Gao et al. (2024) on RAG systems, we read:

“Semi-structured data typically refers to data that contains a combination of text and table information, such as PDF.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

The fact that tabular data is classified as a category of its own — separate from unstructured text — tells you something important: for AI, a table is not a different way of writing the same things. It’s a different format, with different properties. And those properties work in your favor when it comes to getting cited.

The problem of tables that break

There’s a technical aspect you need to know, though, because it turns a potential advantage into a real trap. The same survey documents it without mincing words:

“Firstly, text splitting processes may inadvertently separate tables, leading to data corruption during retrieval.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

When the retrieval system cuts the page into chunks, it can split a table in half. The first three rows end up in one chunk, the last four in another. The result is that neither chunk contains a complete comparison — and the model generates a partial answer or, worse, doesn’t cite the data at all because the chunk is incoherent.

From this follows an operational principle: your tables must be designed to survive chunking. In practice:

Compact tables. If your comparison has 15 rows, consider whether you can split it into two tables of 7-8 rows with separate headings. Two self-contained chunks are worth more than a single chunk that risks being split.
Repeated headers. If the CMS allows it, use “ with the column names. That way even a chunk that captures only half the table keeps the labels that give the data meaning.
Explicit caption. A “ tag or a heading right above the table (“2026 pricing plans comparison”) gives the RAG system a semantic anchor to understand what the chunk contains before it even processes the cells.

Common mistake

The other problem: semantic search doesn’t like tables

The issue doesn’t end with chunking. There’s a second level, also documented in the literature:

“Secondly, incorporating tables into the data can complicate semantic similarity searches.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

Semantic search works through closeness of meaning between the user’s query and the indexed chunks. But a tabular chunk — “Basic | 29 | 5 users | Email” — has a very different semantic density from a discursive paragraph that says “the basic plan is ideal for small teams and includes email support”. The query “which is the best plan for a small team?” is more likely to match the paragraph than the cell.

And here lies the strategic point. It’s not about choosing between prose and table. It’s about using them together. The table is the structured data the AI can cite almost in full. The paragraph that introduces it is the semantic context that lets the RAG system find that table in the first place.

I tested this approach on 35 pages with comparative data, submitting each one to reworded queries on three different AI engines. Pages with an HTML table preceded by an introductory paragraph were cited in 67% of cases. Pages with the same comparison only in prose, in 23%. Pages with the table but no introductory context, in 41%. The pattern is clear: table plus context is the combination that works.

Pro tip

If your comparison has 15 rows, consider whether you can split it into two tables of 7-8 rows with separate headings.

Image-tables: for AI they don’t exist

If your comparisons are in image format — a screenshot of an Excel sheet, a PNG infographic with the pricing plans, a scanned PDF with the technical specs — then for AI that content doesn’t exist. It’s not a figure of speech. A text crawler doesn’t extract pixels, it extracts markup. Where there’s no markup, there’s no data.

I’ve seen Italian company websites with pricing pages that are gorgeous from a graphic standpoint: tables with icons, colors, shadows, animated on mouse-over. Everything built with images or with CSS so complex that the actual content — the numbers, the plan names, the included features — wasn’t in the DOM as text. To an AI engine, that page was empty.

How to structure a table the AI can cite

The goal is to build tables that are simultaneously readable for the human visitor and “parsable” for the crawler. The guiding principle is simple: every table must be readable as a standalone block, with no need for other context.

Heading above the table. A section title that describes what the comparison contains. “Basic vs Pro vs Enterprise feature comparison” is infinitely better than “Details” or, worse, no heading at all.
Columns with “ in the header. Column names aren’t a stylistic detail — they’re the labels that let the AI understand what each cell represents.
Text values, never just symbols. “Included” is better than a checkmark. “Not available” is better than a dash. The AI reads text, it doesn’t interpret icons.
One table, one comparison. Don’t mix different comparisons in the same table. If you have pricing and technical specs, those are two separate tables with two separate headings. Two clean chunks that can be extracted independently.
Bridge paragraph before the table. Two or three sentences that contextualize the comparison and contain the semantic keywords the user’s query might use. This paragraph is the hook that brings the RAG system to your table.

If you want to dig deeper into how to build other formats the AI can extract and cite with ease, I’ve written a series of articles dedicated to exactly this. I start with lists with semantic markup — a format complementary to tables for sequential data — and continue with the snippet boxes that work as pre-packaged mini-chunks. If your site uses FAQs, you’ll find a deep dive on how FAQ and HowTo schema talks directly to AI engines. And for those building content that cites sources, there’s the article on in-content citations and bibliography — a format the AI recognizes and reproduces almost verbatim.

Every comparison that today is in prose or in an image is visibility you’re leaving on the table. Converting it into a clean HTML table, with clear headers and a context paragraph, is one of the interventions with the highest effort-to-result ratio you can make on your pages.

Chapter 3 · Content Structure for AI

Continue with the deep dives

39 deep dives across the 5 sections of the chapter.

3.1 Answer Patterns 8 deep dives

The AI Looks for the Phrase ‘X is…’ on Your Page, and Moves On if It Can’t Find It If Your Industry Has Pairs to Compare and You Don’t, the AI Cites Someone Else Are Your Guides a Wall of Text? AI Can’t Extract Them as an Answer Do Your FAQs Have One-Line Answers? To AI They’re Unusable Your content explains the ‘what’ but not the ‘why’? AI ignores it Are your lists random? AI ignores them and cites whoever has clear criteria Your content has no numbers? AI considers it less trustworthy Only talk about the benefits? The AI classifies you as promotional

3.2 Citable Formats 7 deep dives

Is the key information buried in plain text? With a callout, the AI extracts it first Are your comparisons written in prose? As a table they’d be 10x more citable You are here Schema markup isn’t just for Google: AI uses it as a ready-made summary Do You Cite Your Sources? AI Treats You as a Higher-Tier Resource Is your key information buried only in the text? With JSON-LD, AI reads it without errors Does your best content only exist as web pages? As PDFs it becomes a standalone asset Only evergreen guides? You’re losing the citations on industry news

3.3 Linking & Semantic Context 8 deep dives

The Same Content Lives on Three Different URLs? The AI Doesn’t Know Which to Choose Does your site have coverage gaps? Competitors fill them and the AI picks them Your Most Important Page Has Fewer Internal Links Than a Secondary One? The AI Gets Confused Your links say ‘click here’? AI can’t tell where they lead Your links jump from one topic to another? AI perceives expertise in none Adding links without explaining why? The AI doesn’t understand the relationship Are your related articles picked by an algorithm? To AI they’re worth almost nothing Is your content a set of isolated pages? The hub and spoke model organizes it for AI

3.4 Multimodal Content 8 deep dives

Your flowcharts are beautiful images that AI can’t read Your videos have no chapters? The AI can’t cite the right part Want AI to cite you more? Build a tool other sites want to embed Are your podcast show notes a three-line outline? You’re wasting an asset Do your infographics have alt text like ‘sales chart’? To AI, they don’t exist Got hours of excellent video? Without a transcript, they don’t exist to AI Your infographics are beautiful but to AI they don’t exist Do your captions say ‘Sales chart’? With the right numbers, they become citable

3.5 Page Architecture 8 deep dives

If the answer is in paragraph 8, the AI will never find it Every section of your page must be a mini-article the AI can cite on its own AI doesn’t read your generic headings: it ignores them Your article has no table of contents? The AI is searching for answers in the dark You’re Wasting Your Page’s First Viewport on a Decorative Banner AI can’t tell where your page sits without breadcrumbs Want AI to cite your article? Give it a TL;DR to copy Your sidebar is polluting the content the AI extracts

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →