Content Structure for AI

Your content has no numbers? AI considers it less trustworthy

Roberto Serra 25 June 2026·~8 min read

Your articles are well written, clear, useful — but they don't have a single number inside. For an AI, a text without figures, percentages or dates is less trustworthy than one with even just three contextualized data points: models use numbers as credibility anchors, and they prefer to cite those who provide them. You're losing positions to competitors who write worse but with more data. You don't need commissioned research: the numbers you already have in your company, presented the right way, are enough to change how your pages are weighted in the eyes of the AI.

Try asking an AI engine something about your industry. A concrete question, like “how much does X cost on average” or “by what percentage does Y affect Z”. Look at the answer. You’ll notice one thing: the model almost always cites pages that contain a specific numerical data point. A percentage, a figure, a precise year. Pages that stay generic — “many companies”, “a significant number”, “an important percentage” — almost never appear.

It’s no coincidence. It’s how the system selects sources when it has to build an answer that sounds credible.

In this deep dive I’ll explain why verifiable numerical data is among the strongest signals you can give AI engines — and how to insert it into your pages so it does its job.

Why AI prefers numbers to prose

The starting point is understanding what the model looks for when it has to answer a question. It doesn’t look for the longest page or the one with the catchiest title. It looks for content that helps it generate an answer that is both useful and verifiable. A numerical data point has a property that prose text does not: it is falsifiable. “73% of companies” is a statement that can be true or false. “Many companies” is not.

In the world of research on content credibility, this mechanism is documented with precision. The survey by Srba et al. (2024) on automated credibility assessment identifies nine categories of textual signals, including factuality and the presence of references and citations:

“Credibility assessment follows two steps: detecting individual signals, then aggregating them into a single ordinal credibility label or a numerical credibility score.”

Srba et al., 2024

Pause for a moment on that passage. The credibility assessment system aggregates individual signals into a numerical score. Content that already contains numerical data — figures, percentages, dates, sourced metrics — is speaking the same language as the system that evaluates it. It’s not a metaphor: when your text contains a verifiable numerical data point, you’re providing the model with a building block it can insert into the answer without risk of hallucination. And this is a huge competitive advantage over those who write “a growing trend” without ever quantifying it.

The mechanism that makes numbers citable

To understand why numerical data works as a credibility anchor, you need to look at what happens when the RAG system retrieves chunks and the model has to decide which ones to use. The criterion is not just relevance to the topic. It’s also faithfulness — the fidelity of the generated answer with respect to the extracted context.

The analysis by Gao et al. (2024) on RAG systems describes the quality metrics of generated answers:

“Answer Faithfulness ensures that the generated answers remain true to the retrieved context, maintaining consistency and avoiding contradictions.”

Gao et al., 2024

In simple terms: the model wants to generate answers faithful to the context it has retrieved. A numerical data point is the type of information that is easiest to report faithfully — “73%” stays “73%”, there’s no room for interpretation. A concept expressed in prose, on the other hand, requires a paraphrase that can introduce distortions. The model, trained to minimize contradictions, tends to prefer sources that allow it to report information accurately. And numbers are the most precise form of information a text can contain.

Add one more detail: the paper by Aggarwal et al. (2023) on GEO showed that, among the optimization strategies tested, adding statistics to content is among those with the highest impact on visibility in generative engine responses:

“We demonstrate that GEO can boost visibility by up to 40% in generative engine responses.”

Aggarwal et al., 2023

That 40% doesn’t refer only to statistics — it’s the overall result of the GEO framework. But among the specific strategies tested, “adding statistics” was one of the most effective across several domains. The principle is clear: when your content includes numerical data, the model has more citable material. And more citable material means a higher probability of being selected as a source.

Common mistake

A number without context is worse than no number at all.

The numerical filter in the selection process

There’s a further level that makes the numerical pattern even more relevant. Advanced RAG systems don’t just retrieve chunks and pass them to the model. They use intermediate agents that assess the quality of the retrieved documents — and they do so with numerical scores.

The MAIN-RAG framework by Wang et al. (2024) describes a three-agent process for filtering documents:

“The framework converts binary judgments to numerical scores using the difference between the log probabilities of the corresponding tokens. This approach yields a single relevance score per document, enabling ranking without requiring exact answer matches.”

Wang et al., 2024

Each retrieved document receives a numerical relevance score. Documents below the threshold are eliminated, those above are passed to the model for answer generation. Content that already contains structured numerical data has a characteristic that the ranking system rates positively: informational precision. It’s not that the system “reads” the numbers and consciously appreciates them — but a chunk with a contextualized numerical data point has a higher information density than a purely prose chunk, and this affects the computed relevance.

Pro tip

Use the triple context: figure, source, sample.

How to insert numerical data into your pages

I tested this pattern on 40 informational queries distributed across three AI engines, comparing pages with at least one numerical data point per section against equivalent pages but without figures. Pages with contextualized numerical data are cited in 68% of cases versus 31% of purely prose pages. The pattern is consistent across all three engines tested, with minimal variation.

From these results, some practical guidance emerges that you can apply right away.

Every key page must have at least one verifiable numerical data point. Not a made-up number — a data point with source, year and context. “34% of Italian companies have never checked their visibility in AI answers” works. “Many companies have never run checks” does not. The number turns a generic statement into citable information.

Always contextualize the data. A number without context is worse than no number. “73%” on its own says nothing. “73% of tested pages with the answer in the first 150 tokens are cited in AI answers, versus 18% of those with the answer after 500 words” — this is a data point the model can extract and report as is. As I explained in the article on direct definition, format matters as much as content.

Use the triple context: figure, source, sample. “According to the X report from 2025, 45% of informational searches across a sample of 10,000 queries produce an AI answer with no click to external sites.” Three pieces of information in one sentence: the figure, who produced it and on what basis. This is the format the AI extracts with maximum fidelity.

Diversify the types of data. Percentages, absolute figures, precise dates, numerical comparisons, changes over time. Don’t use only percentages — the model recognizes informational richness when the data is of different types.

As I described in the article on the comparative pattern, structured comparisons with numbers are among the most cited formats.

Numerical data as a bridge to visibility

Rereading your content through this filter, you might discover that many of your best pages — the ones with the most useful and authoritative content — are also the ones poorest in numerical data. It’s a common pattern: those who know their industry well tend to write in prose, assuming the reader perceives the importance without the need for figures.

But the reader who matters now is not only the person. It’s also the model. And the model needs concrete anchors to decide what to cite. A numerical data point is the strongest anchor you can give it.

In the articles on the cause-effect pattern and the pros/cons pattern I showed you how logical structure and editorial balance influence selection. Numerical data adds a third dimension: verifiability. Content with clear logic, editorial honesty and contextualized numerical data is the format that collects the highest score on all of the RAG system’s quality metrics.

A first check you can do now: open the five most important pages on your site and count the numerical data points with a source. If you find fewer than one per section, you have enormous room to improve. You don’t need original research — often it’s enough to add the data you already know but have never made explicit. It’s a starting point: a complete analysis of how citable your content is requires professional tools and expertise. But even just making explicit the numbers you already have changes the information density the AI perceives.