If your articles repeat the same information found on dozens of other sites, AI has no reason to choose you as a source — it already has that information from more established sources. Models assign a novelty score to every piece of content, and anyone who adds nothing original gets systematically ignored. You're producing content, spending time and money, and handing visibility to whoever wrote before you. There is a precise way to find the unique angle AI can't ignore — and it often doesn't require information you don't already have.
Your article on “SEO for ecommerce” is well written, well structured, with the keywords in the right places.
But something is off…
It says exactly the same things as the other fifty articles on the same topic. For Google you might still rank with a good backlink profile. For AI your content has zero information gain — it adds nothing to what the model already knows.
And content that adds nothing has no reason to be cited.
This is the paradigm shift most SEO professionals haven’t absorbed yet: AI doesn’t select sources by popularity, it selects them by informational novelty. The mechanism has a precise name, scientific literature behind it, and immediate operational implications for how you produce content.
The signal researchers have already measured
In 2023, Sungik Choi et al. published a systematic review of language models as evaluation tools. One of the most direct conclusions is this: “Hence, they have also gained much attention as an attractive tool for novelty detection” — language models have established themselves precisely as tools for detecting how new a piece of content is compared to what already exists.
This isn’t speculation about the future. It’s a documented fact about the nature of LLM-based evaluation systems: novelty detection is already part of these models’ technical repertoire.
An important deduction follows from this principle: if the model has the technical ability to measure the novelty of a piece of content, then source selection in a system like Perplexity, or an AI engine in general, is not indifferent to the novelty of what it finds. Content that replicates already-seen information has fewer reasons to be selected than content that introduces something the model doesn’t have elsewhere.
This is a type-B claim: I don’t have a source that explicitly says “AI cites content with high information gain”. I have a source that documents the technical capability, and from there I build the operational deduction. The distinction matters — and I’ll explain why later.
How novelty is calculated
To understand why AI is structurally oriented toward novelty, it helps to understand how language models handle information at a probabilistic level.
When a model processes text, it assigns each token a probability of occurrence based on context. Common content — what the model has read many times during training — produces high log-probability sequences: the model “expects” those words, in that order, because it has seen them a thousand times. Content that says “SEO requires quality content, backlinks and technical optimization” is almost literally predictable for an LLM.
High-information-gain content, by contrast, introduces tokens or combinations of tokens the model hadn’t predicted. Not because they’re random or meaningless — quite the opposite. Because they contain an original data point, an unconventional perspective, or a connection between concepts that hadn’t already been established in the training corpus.
Metrics like BLEU and ROUGE measure the overlap between texts: if your content scores low against what already exists, it means the informational distance is high and you’re contributing something new.
For AI your content has zero information gain — it adds nothing to what the model already knows.
The problem with content that copies itself
There’s a specific phenomenon that makes the problem worse than it first appears. Most online content isn’t independent: it’s built by citing, rephrasing or “improving” the content that came before it on the same topic. The result is an information ecosystem where dozens of articles say essentially the same thing in different words.
For a traditional search engine this was acceptable, because ranking answered the question “which is the most authoritative version of this information?” For an AI system that has to assess the informational fidelity of its sources and their ability to add real knowledge, the phenomenon is a structural problem.
Diversity and novelty aren’t optional — they’re system requirements. A model that always surfaced the same sources for the same topic wouldn’t be useful, regardless of the quality of those sources.
From this it follows that a piece of content’s ability to stand out informationally — even on an already-covered topic — is a variable AI systems have a technical reason to value.
Identify what you can add: a data point from your own practice, a documented observation, a comparison nobody has made.
The grey area: novelty doesn’t mean correctness
It’s worth being explicit about a point the literature doesn’t ignore. Sungik Choi et al. also flag a risk.
The novelty-detection mechanism, in different contexts, can be distorted. Being “new” isn’t enough — novelty has to be informative, verifiable, contextualized. Content that introduces false data or misleading perspectives is new in the technical sense of the term, but it isn’t what a well-calibrated system should reward.
For you, operationally, this means the strategy isn’t “surprise the model with something weird”. It’s “bring real data the model hasn’t already seen”. The difference between the two approaches is the difference between content that gets cited once and then ignored, and content that becomes a stable source because it’s also verifiable.
What makes content informationally new
Concretely, there are categories of content that produce high information gain systematically. Not because they’re formulas — but because by nature they introduce something generic content can’t replicate.
Original data — a survey run with your own clients, a benchmark on a sample you analyzed yourself, a measurement no one else has published. It doesn’t have to be academic research: a data point gathered from your professional practice is still a data point that exists nowhere else.
Documented observations — have you noticed that AI answers on a given topic always cite the same three sources? Have you seen a pattern in how Perplexity treats local queries versus national ones? Those observations, when documented, are pure information gain.
Unestablished connections — if nobody has written about how tokenization affects the visibility of brands whose names are made of words that are rare in Italian, and you do it with supporting data, you have very high information gain on that specific intersection.
Perspectives that contradict the consensus with evidence — not the professional contrarian, but someone who brings data against an established narrative. The model has already read the mainstream version of the topic. The version that challenges it with concrete proof is the one that adds something.
The test you can run before publishing
Before publishing any key piece of content, do this: search the topic on Google and read the first ten results. Then ask yourself a single question: does my content introduce at least one element — a data point, an observation, a connection — that none of these ten articles has?
If the answer is no, don’t publish until you’ve added that element. Not because the content is “bad” — it can be very well written. But because, for an AI system that measures informational novelty, it has exactly the same value as the other ten.
For already-published content, the same review applies: identify the pieces that say what everyone says, and add your data. Even a single original data point turns content at high risk of redundancy into content with a reason to be cited.
The operational framework is this:
- Pick the five most important pieces of content on your site for AI visibility
- For each one, run the ten-results test
- Identify what you can add: a data point from your own practice, a documented observation, a comparison nobody has made
- Add that element before any other technical optimization
AI doesn’t cite those who repeat. It cites those who add something it can’t find elsewhere. And the good news is that to do it you don’t need to become an academic researcher — you just need to stop writing what everyone else has already written.