Authority and Credibility for AI

You’ve published on your topic for 10 years? The AI knows it and rewards you

Roberto Serra 25 June 2026·~8 min read

Did you delete your old articles to tidy up the site? You may have eliminated the competitive advantage that's hardest to rebuild. The AI recognizes how long a source has been publishing on a topic: those with a long, consistent track record on a subject have an authority that a new site can't buy at any price. You're giving away years of work. Recovering and updating what you already have takes less effort than rewriting from scratch — and the results last.

There’s one thing you can’t buy, can’t simulate, and can’t speed up: time. And in the way AI models decide whom to trust, time matters more than you imagine.

If you’ve been publishing content about your industry for years, you’re accumulating an invisible advantage. If instead you arrived yesterday on a topic with a freshly opened blog and ten hastily written articles, the AI senses it. Not because it reads the dates of your posts — the mechanism is more subtle than that.

The historical archive as a strategic asset

Before I explain the mechanism, I want you to understand what’s at stake. When a potential client asks Perplexity “who are the best consultants for X in my industry,” the system retrieves sources, evaluates them, and builds the answer. In that moment, a source that has published relevant content on that topic for five consecutive years sends a different signal than one that appeared six months ago.

This isn’t a subjective judgment by the AI. It’s a direct consequence of how models are trained and how RAG systems select sources.

How temporal authority emerges: the training data

To understand why time matters, you have to start with how models learn. Every LLM is trained on an enormous corpus of texts, collected at different moments. This corpus isn’t a single snapshot: it’s the result of multiple successive crawls, months or years apart.

A 2024 paper by Cheng et al. investigated what actually happens inside models with respect to the resources they were trained on:

“We seek to probe LLMs to determine their resource-level effective cutoffs, defining the effective cutoff date of a model with respect to a resource as the date of the version of the resource that is most closely aligned with the model.”

Cheng et al., 2024

The key concept is “effective cutoff date with respect to a resource” — every resource has its own effective cutoff date. There’s no single cutoff for the entire model. The model might have information updated to 2024 on a popular subject and frozen at 2021 on another.

This means the model doesn’t treat all sources as equal. A resource that was included in the corpus across multiple successive versions — because it already existed in the 2020, 2022, and 2024 crawls — has a layered presence in the model’s memory. The model has “seen” it multiple times, in different contexts, with progressive updates.

From this follows an important deduction: sources that have been publishing on the same topic for years, updating regularly, are more likely to have been included in successive versions of the training corpus. And every inclusion reinforces the weight of that source within the model’s internal knowledge.

Common mistake

Your competitors probably don’t know that their archive of 2018 blog posts is an asset for AI visibility, and many of them are deleting it to “clean up the site.”

The link with the knowledge cutoff

If you’ve read my article on the knowledge cutoff, you already know that the cutoff isn’t a clean date but a gray zone. The effective cutoff varies from resource to resource.

Now add another piece: if the effective cutoff depends on how many versions of a resource ended up in the corpus, then a source with ten years of publications on the same topic has had ten years of opportunities to be included. A source born last year has had just one, at best.

This isn’t an opinion — it’s the logical consequence of how training data is built. Crawlers collect the web periodically. Those who were there earlier were collected more times. Those collected more times carry more weight in the model. Those that carry more weight in the model get retrieved more easily when the model has to build an answer.

Pro tip

Here’s the first practical rule: never delete old content.

What happens in RAG systems

Temporal authority doesn’t act only on the training data. It also acts at the moment when RAG systems — the ones powering Perplexity, Bing Chat, and Google’s AI Overviews — decide which sources to retrieve and with what priority.

When a RAG system searches for sources to answer a query, it evaluates several quality signals. Among these, a domain’s thematic depth and consistency counts. A site with 50 articles on the same topic, published over the span of 5 years, with visible publication and update dates, sends a signal of consolidated expertise. A site with 10 articles all dated the same month sends a different signal.

The E-E-A-T signals — Experience, Expertise, Authoritativeness, Trustworthiness — which the AI inherits as proxies of trust, have an intrinsic temporal component. Experience, by definition, requires time. You can’t demonstrate ten years of experience with a site born three months ago, however impeccable the content may be.

Training data bias reinforces the pattern

There’s another element that amplifies this mechanism. I discussed it in the article on training data bias: training data isn’t a neutral sample of the web. Some sources are over-represented, others under-represented.

Sources with a long editorial history tend to have more indexed pages, more backlinks, more mentions on third-party platforms. All factors that increase the probability of being included in training datasets. This creates a virtuous circle: those who’ve published longer have more pages, more pages mean more inclusions in training, more inclusions mean more weight in the model.

And when the model has to decide between two sources saying similar things, the consensus signal works in favor of those already established. The historical source is more likely to be aligned with the industry consensus, simply because it helped form it.

What it means for those starting from scratch

If you’re reading this and thinking “so it’s too late for me,” stop. That’s not how it works.

Temporal authority is a cumulative advantage, not an absolute barrier to entry. Those starting today can build it, but they need to know the result won’t arrive in three months. The good news is that almost no one in your industry is thinking about this strategically. Your competitors probably don’t know that their archive of 2018 blog posts is an asset for AI visibility, and many of them are deleting it to “clean up the site.”

Here’s the first practical rule: never delete old content. If you have dated articles, update them. Add the original publication date and the last update date. An article published in 2019 and updated in 2026 tells the crawler: “this source has seven years of history on this topic and keeps it up to date.”

How to build your temporal authority

The strategy changes depending on where you are.

If you have a historical archive, the work is about adding value. Revisit your old topical articles, update them with current data, keep the original publication date, and add dateModified in the schema markup. Don’t rewrite them from scratch: progressive updating is more effective than replacement, because it preserves the continuity that the crawler recognizes.

If you’re starting from scratch, the work is about methodical construction. Choose one specific topic — not five, just one to start — and publish consistently. You don’t need an article a day. You need an article a month, but for three years. Consistency over time is more powerful than volume in the short term.

In both cases, make sure your topical presence isn’t only on your own site. Cross-platform reputation amplifies the temporal signal: if your name has appeared for years on industry directories, professional profiles, and vertical media as well, the authority signal multiplies.

The checks to do right now

You can start to get a sense of your situation with a few basic checks.

Open your site and check the date of the oldest article on your main topic. Then go to the Wayback Machine (web.archive.org) and verify how long your domain has been publishing content on that subject. If you find a history of 5+ years with consistent topical content, you have an advantage you may not have known you had.

Then run the same check on your direct competitors. Who has a longer history on the topic? Who has the deepest archive? This tells you whom you’re competing against for temporal authority in your industry.

These are surface-level checks; for a precise picture of how your brand is represented in the training data you need specific tools and expertise. But they give you a direction.

Time is the only asset that can’t be replicated

Today the AI increasingly decides which brands to mention in its answers, and editorial history is a structural differentiator. It can’t be bought, can’t be manufactured with a month of intensive content, can’t be simulated.

Those who have published on their topic for years are sitting on an advantage that grows with every crawl, with every new version of the training datasets, with every model update. Those starting today can build it, but they need to know that time is part of the formula.