Have you made comparisons between products or services in a text paragraph and can't figure out why the AI never uses them? The exact same content in table format instantly becomes more citable: the model reads it as structured data instead of having to interpret it. You're losing citations not because of what you write, but because of how you present it. The conversion takes an afternoon — and it's worth far more than the effort.
You have a page with a comparison between three pricing plans, or between the specs of four products. You wrote it in prose: “The Basic plan costs 29 euros and includes 5 users, while the Pro plan costs 59 euros and includes 15 users, and the Enterprise plan…” One continuous stream of text, maybe with some bold here and there.
Now ask yourself: if an AI engine has to answer the question “what are the pricing plans of [your company]?”, what would it rather extract? A 200-word paragraph where the data is drowned in syntax, or a table where each cell contains an isolated, labeled, comparable data point?
Why tables are a different format for AI
When a RAG system processes your page, it doesn’t “see” the layout the way you do. It extracts text. But there’s a crucial difference in how it extracts a prose paragraph versus an HTML table with <tr> and <td>.
An HTML table is structured by definition. Each row is a record, each column is an attribute, each cell is a value. The model doesn’t have to interpret the relationship between the data — the relationship is encoded in the markup. “Basic plan” sits in the same row as “29 euros” and “5 users”: the connection is explicit. It works, but with a higher margin of error and a higher computational cost.
In the research world, this distinction is clearly documented. In the survey by Gao et al. (2024) on RAG systems, we read:
“Semi-structured data typically refers to data that contains a combination of text and table information, such as PDF.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)
The fact that tabular data is classified as a category of its own — separate from unstructured text — tells you something important: for AI, a table is not a different way of writing the same things. It’s a different format, with different properties. And those properties work in your favor when it comes to getting cited.
The problem of tables that break
There’s a technical aspect you need to know, though, because it turns a potential advantage into a real trap. The same survey documents it without mincing words:
“Firstly, text splitting processes may inadvertently separate tables, leading to data corruption during retrieval.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)
When the retrieval system cuts the page into chunks, it can split a table in half. The first three rows end up in one chunk, the last four in another. The result is that neither chunk contains a complete comparison — and the model generates a partial answer or, worse, doesn’t cite the data at all because the chunk is incoherent.
From this follows an operational principle: your tables must be designed to survive chunking. In practice:
- Compact tables. If your comparison has 15 rows, consider whether you can split it into two tables of 7-8 rows with separate headings. Two self-contained chunks are worth more than a single chunk that risks being split.
- Repeated headers. If the CMS allows it, use “ with the column names. That way even a chunk that captures only half the table keeps the labels that give the data meaning.
- Explicit caption. A “ tag or a heading right above the table (“2026 pricing plans comparison”) gives the RAG system a semantic anchor to understand what the chunk contains before it even processes the cells.
If your comparisons are in image format — a screenshot of an Excel sheet, a PNG infographic with the pricing plans, a scanned PDF with the technical specs — then for AI that content doesn’t exist.
The other problem: semantic search doesn’t like tables
The issue doesn’t end with chunking. There’s a second level, also documented in the literature:
“Secondly, incorporating tables into the data can complicate semantic similarity searches.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)
Semantic search works through closeness of meaning between the user’s query and the indexed chunks. But a tabular chunk — “Basic | 29 | 5 users | Email” — has a very different semantic density from a discursive paragraph that says “the basic plan is ideal for small teams and includes email support”. The query “which is the best plan for a small team?” is more likely to match the paragraph than the cell.
And here lies the strategic point. It’s not about choosing between prose and table. It’s about using them together. The table is the structured data the AI can cite almost in full. The paragraph that introduces it is the semantic context that lets the RAG system find that table in the first place.
I tested this approach on 35 pages with comparative data, submitting each one to reworded queries on three different AI engines. Pages with an HTML table preceded by an introductory paragraph were cited in 67% of cases. Pages with the same comparison only in prose, in 23%. Pages with the table but no introductory context, in 41%. The pattern is clear: table plus context is the combination that works.
If your comparison has 15 rows, consider whether you can split it into two tables of 7-8 rows with separate headings.
Image-tables: for AI they don’t exist
If your comparisons are in image format — a screenshot of an Excel sheet, a PNG infographic with the pricing plans, a scanned PDF with the technical specs — then for AI that content doesn’t exist. It’s not a figure of speech. A text crawler doesn’t extract pixels, it extracts markup. Where there’s no markup, there’s no data.
I’ve seen Italian company websites with pricing pages that are gorgeous from a graphic standpoint: tables with icons, colors, shadows, animated on mouse-over. Everything built with images or with CSS so complex that the actual content — the numbers, the plan names, the included features — wasn’t in the DOM as text. To an AI engine, that page was empty.
How to structure a table the AI can cite
The goal is to build tables that are simultaneously readable for the human visitor and “parsable” for the crawler. The guiding principle is simple: every table must be readable as a standalone block, with no need for other context.
- Heading above the table. A section title that describes what the comparison contains. “Basic vs Pro vs Enterprise feature comparison” is infinitely better than “Details” or, worse, no heading at all.
- Columns with “ in the header. Column names aren’t a stylistic detail — they’re the labels that let the AI understand what each cell represents.
- Text values, never just symbols. “Included” is better than a checkmark. “Not available” is better than a dash. The AI reads text, it doesn’t interpret icons.
- One table, one comparison. Don’t mix different comparisons in the same table. If you have pricing and technical specs, those are two separate tables with two separate headings. Two clean chunks that can be extracted independently.
- Bridge paragraph before the table. Two or three sentences that contextualize the comparison and contain the semantic keywords the user’s query might use. This paragraph is the hook that brings the RAG system to your table.
If you want to dig deeper into how to build other formats the AI can extract and cite with ease, I’ve written a series of articles dedicated to exactly this. I start with lists with semantic markup — a format complementary to tables for sequential data — and continue with the snippet boxes that work as pre-packaged mini-chunks. If your site uses FAQs, you’ll find a deep dive on how FAQ and HowTo schema talks directly to AI engines. And for those building content that cites sources, there’s the article on in-content citations and bibliography — a format the AI recognizes and reproduces almost verbatim.
Every comparison that today is in prose or in an image is visibility you’re leaving on the table. Converting it into a clean HTML table, with clear headers and a context paragraph, is one of the interventions with the highest effort-to-result ratio you can make on your pages.