How AI engines think

Are you rewriting what everyone else has written? AI wants novelty

Roberto Serra 24 June 2026·~7 min read

If your articles repeat the same information found on dozens of other sites, AI has no reason to choose you as a source — it already has that information from more established sources. Models assign a novelty score to every piece of content, and anyone who adds nothing original gets systematically ignored. You're producing content, spending time and money, and handing visibility to whoever wrote before you. There is a precise way to find the unique angle AI can't ignore — and it often doesn't require information you don't already have.

Your article on “SEO for ecommerce” is well written, well structured, with the keywords in the right places.

But something is off…

It says exactly the same things as the other fifty articles on the same topic. For Google you might still rank with a good backlink profile. For AI your content has zero information gain — it adds nothing to what the model already knows.

And content that adds nothing has no reason to be cited.

This is the paradigm shift most SEO professionals haven’t absorbed yet: AI doesn’t select sources by popularity, it selects them by informational novelty. The mechanism has a precise name, scientific literature behind it, and immediate operational implications for how you produce content.

The signal researchers have already measured

In 2023, Sungik Choi et al. published a systematic review of language models as evaluation tools. One of the most direct conclusions is this: “Hence, they have also gained much attention as an attractive tool for novelty detection” — language models have established themselves precisely as tools for detecting how new a piece of content is compared to what already exists.

This isn’t speculation about the future. It’s a documented fact about the nature of LLM-based evaluation systems: novelty detection is already part of these models’ technical repertoire.

An important deduction follows from this principle: if the model has the technical ability to measure the novelty of a piece of content, then source selection in a system like Perplexity, or an AI engine in general, is not indifferent to the novelty of what it finds. Content that replicates already-seen information has fewer reasons to be selected than content that introduces something the model doesn’t have elsewhere.

This is a type-B claim: I don’t have a source that explicitly says “AI cites content with high information gain”. I have a source that documents the technical capability, and from there I build the operational deduction. The distinction matters — and I’ll explain why later.

How novelty is calculated

To understand why AI is structurally oriented toward novelty, it helps to understand how language models handle information at a probabilistic level.

When a model processes text, it assigns each token a probability of occurrence based on context. Common content — what the model has read many times during training — produces high log-probability sequences: the model “expects” those words, in that order, because it has seen them a thousand times. Content that says “SEO requires quality content, backlinks and technical optimization” is almost literally predictable for an LLM.

High-information-gain content, by contrast, introduces tokens or combinations of tokens the model hadn’t predicted. Not because they’re random or meaningless — quite the opposite. Because they contain an original data point, an unconventional perspective, or a connection between concepts that hadn’t already been established in the training corpus.

Metrics like BLEU and ROUGE measure the overlap between texts: if your content scores low against what already exists, it means the informational distance is high and you’re contributing something new.

Common mistake

For AI your content has zero information gain — it adds nothing to what the model already knows.

The problem with content that copies itself

There’s a specific phenomenon that makes the problem worse than it first appears. Most online content isn’t independent: it’s built by citing, rephrasing or “improving” the content that came before it on the same topic. The result is an information ecosystem where dozens of articles say essentially the same thing in different words.

For a traditional search engine this was acceptable, because ranking answered the question “which is the most authoritative version of this information?” For an AI system that has to assess the informational fidelity of its sources and their ability to add real knowledge, the phenomenon is a structural problem.

Diversity and novelty aren’t optional — they’re system requirements. A model that always surfaced the same sources for the same topic wouldn’t be useful, regardless of the quality of those sources.

From this it follows that a piece of content’s ability to stand out informationally — even on an already-covered topic — is a variable AI systems have a technical reason to value.

Pro tip

Identify what you can add: a data point from your own practice, a documented observation, a comparison nobody has made.

The grey area: novelty doesn’t mean correctness

It’s worth being explicit about a point the literature doesn’t ignore. Sungik Choi et al. also flag a risk.

The novelty-detection mechanism, in different contexts, can be distorted. Being “new” isn’t enough — novelty has to be informative, verifiable, contextualized. Content that introduces false data or misleading perspectives is new in the technical sense of the term, but it isn’t what a well-calibrated system should reward.

For you, operationally, this means the strategy isn’t “surprise the model with something weird”. It’s “bring real data the model hasn’t already seen”. The difference between the two approaches is the difference between content that gets cited once and then ignored, and content that becomes a stable source because it’s also verifiable.

What makes content informationally new

Concretely, there are categories of content that produce high information gain systematically. Not because they’re formulas — but because by nature they introduce something generic content can’t replicate.

Original data — a survey run with your own clients, a benchmark on a sample you analyzed yourself, a measurement no one else has published. It doesn’t have to be academic research: a data point gathered from your professional practice is still a data point that exists nowhere else.

Documented observations — have you noticed that AI answers on a given topic always cite the same three sources? Have you seen a pattern in how Perplexity treats local queries versus national ones? Those observations, when documented, are pure information gain.

Unestablished connections — if nobody has written about how tokenization affects the visibility of brands whose names are made of words that are rare in Italian, and you do it with supporting data, you have very high information gain on that specific intersection.

Perspectives that contradict the consensus with evidence — not the professional contrarian, but someone who brings data against an established narrative. The model has already read the mainstream version of the topic. The version that challenges it with concrete proof is the one that adds something.

The test you can run before publishing

Before publishing any key piece of content, do this: search the topic on Google and read the first ten results. Then ask yourself a single question: does my content introduce at least one element — a data point, an observation, a connection — that none of these ten articles has?

If the answer is no, don’t publish until you’ve added that element. Not because the content is “bad” — it can be very well written. But because, for an AI system that measures informational novelty, it has exactly the same value as the other ten.

For already-published content, the same review applies: identify the pieces that say what everyone says, and add your data. Even a single original data point turns content at high risk of redundancy into content with a reason to be cited.

The operational framework is this:

Pick the five most important pieces of content on your site for AI visibility
For each one, run the ten-results test
Identify what you can add: a data point from your own practice, a documented observation, a comparison nobody has made
Add that element before any other technical optimization

AI doesn’t cite those who repeat. It cites those who add something it can’t find elsewhere. And the good news is that to do it you don’t need to become an academic researcher — you just need to stop writing what everyone else has already written.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives

Step-by-step guides: why AI loves them (and how to write them) AI Agents and APIs: Your Business Can Become a Service the AI Calls Is AI inventing things about your brand? It happens when it can’t find reliable data Cover the Whole Workflow or the AI Ignores You (and Picks Another Source) Whoever Gets Cited in ChatGPT’s First Turn Has an Edge Over Everyone Else If the AI says ‘might’ when talking about you, you have a trust problem If your brand info contradicts itself, AI picks a competitor ‘Recommend the best X in Y’: does your content match this query?

1.2 Evaluation & Scoring 8 deep dives

Writing Too Complex? AI Struggles More to Use Your Content How to Become the Brand AI Generates Automatically for Your Industry Want AI to rephrase you? Write the answer exactly as you want it Exaggerated data on your site? AI discards it and picks whoever is more honest Your title says one thing, your content another? AI notices and penalizes you Logical gaps and contradictions? AI lowers your content’s score Who Is Your Brand Cited With? This Determines Your AI Category Are you rewriting what everyone else has written? AI wants novelty You are here

1.3 LLM Architecture 8 deep dives

AI Replies With Outdated Data About Your Brand? Here’s Why It Happens Is your brand invisible to ChatGPT? The problem starts with how it reads it AI reads your page like a book: it skips the middle How AI Decides Which Words Matter Most on Your Page If your page is too long, the AI cuts it and loses you Why ChatGPT Always Recommends the Same Brands (and How to Get on the List) The semantic distance between you and your customer decides whether AI finds you For AI, your page structure matters more than length

1.4 Retrieval & Grounding 7 deep dives

Perplexity and Bing Chat search in real time: are you in their index? Exact keywords or synonyms? AI needs both (here’s why) AI doesn’t read your whole page — it slices it into chunks After retrieval comes reranking: this is where generic content loses Want AI to cite your site by name and with a link? Here’s what you need to give it AI rewrites the question before searching: is your content ready? AI combines multiple sources to answer: are you in at least 2 of them?

1.5 Training & Alignment 7 deep dives

Useful, accurate and safe: the 3 criteria AI uses to judge your content The AI’s Internal Filters Can Block Your Site Without Warning Is your industry underrepresented in the training data? AI already starts at a disadvantage Vertical AI models: if you’re not in their data, you don’t exist in their world Copied content? The AI keeps the original and discards yours The perfect answer according to AI: structured, specific, with sources Aggressive SEO in 2026? AI Safety Filters Are Already Penalizing You

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →