Authority and Credibility for AI

Backlinks aren’t just for Google: AI uses them in training to weight sources

You're collecting links from dozens of small sites and generic blogs, convinced that the more the better — but AI works differently. A thousand citations from unknown sites are worth less than ten mentions in a national newspaper, a university, or a recognized professional directory. If your strategy is quantity rather than quality of sources, you're wasting energy while the competitor who earned three mentions in the right places beats you without even trying. Redirecting your efforts toward the domains that matter doesn't require more budget — it just requires knowing where to aim.

If you work in online visibility, the word “backlink” makes you think of one thing only: Google. PageRank, link juice, optimized anchor text. And for years that’s how it was — the link was the vote one site gave to another, and Google counted the votes.

But there’s a step I see almost every professional I talk to overlook. AI models — the ones powering the answers of ChatGPT, Perplexity, Gemini — don’t use PageRank. They don’t have an algorithm that counts links in real time. And yet backlinks matter to them too, in a way that completely changes the perspective on where to invest your time.

The link graph as a network of citations in the corpus

Let’s start from a fact: models are trained on enormous amounts of text collected from the web. And that text isn’t a set of isolated pages. Every page contains links to other pages. Those links are part of the text the model processes during training.

From this follows an important deduction, and I want to make it explicit because it isn’t a fact proven by a single experiment, but a logical reasoning based on the mechanics of training. When an authoritative page links to another page in the context of a specific topic, the model processes that association. Not as a “vote” in an algorithmic sense, but as contextual co-occurrence: source A mentions and links source B while discussing topic X. If this pattern repeats — more authoritative sources linking to the same domain while discussing the same topic — the model develops a statistical association between that domain and that topic.

In practice, backlinks work in the training data the way academic citations work in research: they aren’t a generic vote of quality, but a contextual signal that says “on this topic, this source is relevant.”

Why links from authoritative sources carry more weight

Not all backlinks have the same impact in training. And here comes a principle that has been established in the academic world for decades: the weight of a citation depends on who cites it.

A 2023 paper by Aggarwal et al. measured something very concrete about the effect of citations on AI visibility:

“Including citations, quotations from relevant sources, and statistics can significantly boost source visibility in generative engine responses, with visibility improvements exceeding 40 percent.”

Aggarwal et al., 2023

That 40% visibility improvement isn’t a theoretical number — it’s an experimental measure. And the principle that follows is direct: sources cited by other relevant sources gain more visibility in AI responses. Not because the AI “follows the link” the way a crawler does. But because in the training corpus, that network of citations builds a statistical weight that the model incorporates.

Translated to your case: a backlink from Il Sole 24 Ore that mentions you as an expert in your field isn’t valuable only for referral traffic or traditional SEO. It’s valuable because that text — with your name, your domain, the topical context — ends up in the training corpus. And the model processes it.

Common mistake

Anyone who keeps buying link packages is investing in only one level — and the one that matters least for the future.

The structural bias toward third-party sources

There’s a second element that makes this mechanism even more relevant. A 2025 analysis by the same Nick Koudas et al. documented a systematic pattern:

“AI Search exhibit a systematic and overwhelming bias towards Earned media — third-party, authoritative sources — over Brand-owned and Social content.”Koudas et al., 2025

This shifts the priorities sharply. The content you publish on your own site is necessary, but on its own it isn’t enough to build the weight you need. AI structurally favors mentions that come from authoritative third-party sources — exactly the kind of sources that generate quality backlinks.

If you think about it, there’s a precise logic to it. During training, the model processed billions of pages. Brand-owned pages (your site, your social profiles) all say the same thing: “we’re good.” The pages of third-party sources that mention you say something different: “this source is relevant to this topic.” For the model, the second signal is more informative than the first. And the statistical weight reflects this asymmetry.

Pro tip

You need to build a profile of citations — few, targeted, from sources with a high probability of being in the training data, in topically relevant contexts.

How backlinks feed the AI’s knowledge graph

The mechanism doesn’t stop at static training. In RAG systems — the ones that search for information in real time before answering — the links between sources play a role in how the system builds its representation of knowledge.

Richard Sinnott et al. in 2026 describe a technical step worth reading:

“The retrieved web evidence is then aligned with the KG schema and merged with the KG subgraph to construct an augmented, multi-source knowledge representation.”

Sinnott et al., 2026

The key point is “multi-source knowledge representation.” The system doesn’t take a single source and present it as the answer. It retrieves multiple sources, aligns them with the existing knowledge graph, and builds an integrated representation. In this process, sources that are linked to each other — that cite one another, that link to one another — form a denser and more coherent network. And a denser network conveys more confidence to the system.

This connects directly to what I’ll explore in the article on the Knowledge Panel: being a recognized entity in the knowledge graph means existing as a node in this network. Backlinks from authoritative sources are one of the ways that node grows stronger.

Where to invest: the logic of the training data

In light of all this, the practical question becomes: which backlinks really matter for AI visibility?

The answer differs from that of traditional SEO. Volume doesn’t matter — two things do: the probability that the source is in the training data, and the topical context of the link.

A link from a .edu domain, from a national media outlet, from a professional industry directory, or from a technical publication carries disproportionate weight. Not because the .edu domain has an algorithmic “bonus,” but because these domains have a very high probability of being included in the training datasets. A link from an unknown blog with 50 visits a month, even if topically relevant, has a much lower probability of ending up in the corpus.

And context matters. A link on the “partner” page of an authoritative site, with no text around it, conveys less signal than a link inside an in-depth article that discusses your topic and mentions you as a source. The model processes the text around the link — the semantic context is part of the signal.

An exercise you can do right now: take your 10 most authoritative backlinks and ask yourself, for each one — does this domain have a probability of being in an LLM’s training data? If the answer is yes for at least half of them, you’re building in the right direction. If the answer is yes for one or two, the bulk of your link profile is working only for Google, not for AI. It’s a surface-level check, but it gives you an immediate snapshot of where you stand.

It’s no longer just link building: it’s citation building

The turning point is this: for AI, a backlink isn’t a mechanical vote. It’s a contextual citation. And like any citation, its value depends on who cites it, in what context, and how much that pattern repeats across the corpus.

This means the strategy changes. You don’t need to accumulate hundreds of links from generic directories. You need to build a profile of citations — few, targeted, from sources with a high probability of being in the training data, in topically relevant contexts.

Your domain’s topical authority grows stronger when external sources confirm your expertise on a specific topic. Content recency comes into play because RAG systems retrieve up-to-date sources — and if the most recent mentions about you are fresh, the signal is stronger. And implicit references — textual mentions without a link — complete the picture, because the model processes text, not just hypertext.

It’s mechanics, not opinion. Anyone who builds a network of authoritative, contextual citations is building an asset that works on two levels: traditional SEO and AI visibility. Anyone who keeps buying link packages is investing in only one level — and the one that matters least for the future.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives
2.2 Brand Authority 8 deep dives
2.3 Sources & Citations 7 deep dives
2.4 Technical Credibility 8 deep dives
2.5 Trust & Reputation 9 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand