Authority and Credibility for AI

Your content’s update date is a signal the AI reads

Are you signaling to the AI that your content is up to date, without having changed anything real? The most advanced AI systems detect this inconsistency and interpret it as a sign of low reliability — exactly the opposite of what you were trying to achieve. The result is that those who update less often but with genuine changes outrank you, because the AI trusts them more. There are precise ways to signal freshness credibly, and once they're set up correctly they work on their own.

You’ve implemented HTTPS, your site loads in under two seconds, AI bots can “crawl” it (scan it, read it and record it in their database) without obstacles, and the HTML is semantically correct. Yet your content still doesn’t show up in answers. The missing piece might not be in the content itself, but in a piece of metadata you’ve never touched: the date.

I’m not talking about the publication date the reader sees at the bottom of the article. I’m talking about three specific technical signals that RAG systems read before they even evaluate a single word of what you’ve written: the dateModified field in the schema markup, the lastmod value in the XML sitemap, and the datePublished field in the structured data. They’re metadata invisible to the human reader, but perfectly visible to the machine. And the machine uses them to decide whether it’s worth passing your content to the model — or whether it’s better to take the competitor’s, the one with a more recent date.

Multiple versions, the same content: the problem the AI has to solve

To understand why this metadata matters so much, you have to start from a real problem that language models face every day. Cheng et al. in 2024 describe it precisely:

“Even more difficult are cases where there exist multiple versions of a resource, and the model may have been exposed to different versions at different times.”

Cheng et al., 2024

Pause for a second on this point. The model has seen different versions of the same resource at different moments. Your “services” page from three years ago and the version you updated six months ago coexist in the system’s information space — and the system has to decide which one is correct. How does it do it? It uses the temporal signals. And if those signals are missing or inconsistent, the system has no tools to distinguish the current version from the obsolete one.

I’ve already explored the concept of recency as a retrieval factor in the article on content recency. Here the topic is different: we’re not talking about the principle, but about the specific technical signals you need to implement so the system can read your content’s freshness.

How the RAG system weighs timestamps

RAG systems don’t retrieve content blindly. Every chunk that enters the retrieval pipeline carries metadata with it — and the timestamp is one of the most influential. Gao et al. in 2024 formalize this mechanism:

“Assigning different weights to document timestamps during retrieval can achieve a balance between information relevance and timeliness.”

Gao et al., 2024

A balance between relevance and timeliness. It’s not that the system automatically discards everything that isn’t from yesterday. But when two pieces of content compete for the same slot in the answer — same relevance, same perceived quality — the more recent timestamp wins. And that timestamp isn’t extracted from the page’s visible text. It’s read from the structured metadata.

This means something very concrete: if your content doesn’t declare when it was last updated in a format the machine knows how to interpret, you’re giving up a competitive advantage you could have for free.

Common mistake

Just changing the date is the digital equivalent of rolling back the odometer.

The three signals you need to check

Not all temporal signals carry the same weight, and not all of them are read at the same point in the process.

  • datePublished and dateModified in the Article schema. When you implement schema.org markup on your page (and you should do it on every strategic piece of content), the datePublished and dateModified fields communicate two distinct pieces of information to the system: when the content was born and when it was last touched. dateModified is the field that weighs most for freshness, because it tells the RAG system that someone revised that content after the initial publication. An article published in 2022 with dateModified set to 2026 communicates: this content is still actively maintained.
  • lastmod in the XML sitemap. The sitemap is the first point of contact between the crawler and your site. The lastmod field on each URL tells the bot: “this page changed on this date”. AI crawlers — GPTBot, PerplexityBot, ClaudeBot — use this data to decide crawling priority. If your lastmod has never been updated since the first publication, the bot may decide it isn’t worth coming back to visit that page. And if it doesn’t visit it, it won’t index the updated version.
  • The HTTP Last-Modified header. This is the most technical of the three signals, handled at the server level. When a crawler requests a page, the server can respond with a Last-Modified header indicating the file’s last change. It’s a low-level but important signal, because it acts even before the content is parsed.

These three signals work together. If they’re aligned — the sitemap says the page was updated on March 15, the schema confirms a dateModified of March 15, and the HTTP header agrees — the system has a strong, consistent freshness signal. If they contradict each other, the signal weakens. I’ve talked about the importance of aligning technical signals in the articles on HTTPS and semantic markup: consistency across the technical layers is a pattern that keeps coming back.

Pro tip

The principle is simple: update the content first, then update the metadata.

The mistake that ruins everything: updating the date without updating the content

At this point, the temptation is obvious. I’ll update the dateModified and the lastmod every week without touching the text, and the system always sees me as fresh. No.

Advanced retrieval systems don’t stop at the timestamp. They compare versions. If the delta between the previous version and the “updated” one is zero — same words, same structure, same data — the freshness signal loses credibility. It isn’t documented in a single paper as a codified rule, but it’s a logical deduction from the way versioning works in RAG systems: if the system keeps track of versions, an identical version with a different date isn’t an update. It’s noise.

The principle is simple: update the content first, then update the metadata. New data, more recent sources, expanded or rewritten sections, obsolete information removed — that’s an update. Just changing the date is the digital equivalent of rolling back the odometer.

The connection with the knowledge cutoff

There’s one more layer that concerns training, not just retrieval. I talked about it in the article on the knowledge cutoff: every model has a date beyond which it knows nothing directly. But even within that time window, the model has absorbed different versions of the same pages at different points in training.

From this it follows — and I want to flag it as a deduction — that freshness signals act on two fronts. In real-time RAG retrieval, they influence the ranking of the retrieved content. In training, they determine which version of your content the model has internalized. If you haven’t updated in years, the version in the training is the old one. And every new training cycle that includes content fresher than the competitors on the same topic pushes your version further down in the model’s vector space.

Where to start

This is the operational part. You don’t need complex tools to do a first check:

  • Check the schema markup of your key pages. Open Google’s Rich Results Test, paste the URL, and check whether `datePublished` and `dateModified` are present in the structured data. If they’re missing, they need to be added.
  • Check the XML sitemap. Open `yoursite.com/sitemap.xml` and verify that every strategic URL has a `lastmod` field with a date consistent with the actual last update. If all the pages have the same date or no date, the signal is absent.
  • Align the three layers. Schema markup, sitemap, and HTTP header must tell the same story. A `dateModified` set to 2026 with a `lastmod` stuck at 2023 is a contradiction that weakens both signals.

This gives you an initial snapshot. For a complete analysis you need to cross-reference the freshness signals with the site’s actual crawlability, the page experience, and the quality of the underlying content — because a recent date on mediocre content doesn’t work miracles.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives
2.2 Brand Authority 8 deep dives
2.3 Sources & Citations 7 deep dives
2.4 Technical Credibility 8 deep dives
2.5 Trust & Reputation 9 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand