How AI engines think

Your title says one thing, your content another? AI notices and penalizes you

Roberto Serra 25 June 2026·~8 min read

Your page title promises one thing, but when readers open the content they find something different or incomplete? On Google you could get away with it, but AI runs a precise consistency check between what you declare in the title and what you actually deliver — and if they don't match, you get excluded as an unreliable source. It isn't a visible penalty: you simply never get cited, without knowing why. Aligning titles and content requires surgical work on the pages you already have, without rewriting anything from scratch.

Your title tag reads “How to choose a CRM for SMBs.” The content is a generic article about management software with half a paragraph devoted to CRMs. A human user clicks and is disappointed. An AI system does something different: it compares what the title promises with what the text delivers — and if they don’t match, it lowers the weight of your source in the answer.

This mechanism is called citation accuracy: an AI system’s ability to verify that a cited piece of content actually supports the claim built on top of it. It isn’t a theoretical hypothesis — it’s a documented aspect of quality in retrieval-augmented generation.

The documented mechanism: what the research says

The starting point is the structure of RAG (Retrieval-Augmented Generation) systems, the technology that powers Perplexity, Bing Copilot and the “grounded” versions of ChatGPT and Gemini. In these systems, the model doesn’t answer from memory alone — it retrieves chunks of text from external sources and uses them to build the answer.

Gao et al. (2024), in the paper Retrieval-Augmented Generation for Large Language Models, document that RAG systems are evaluated along precise quality dimensions, including the fidelity between the retrieved content and the generated answer. Among these, two are directly relevant to citation accuracy: context relevance and noise robustness.

As the authors write: “Context relevance and noise robustness are important for evaluating the quality of the retrieval.”

Context relevance measures whether the retrieved chunk is actually pertinent to the query. Noise robustness measures the system’s ability to ignore chunks that contain information irrelevant to or contradictory with the answer to be generated. A title that promises A but content that covers B produces exactly this: noise. The RAG system is trained to recognize it and penalize it during chunk selection.

A second layer of verification concerns the fidelity of the content to its own source. Minaee et al. (2025) introduce the concept of Faithfulness Classification Metrics: “Faithfulness Classification Metrics offer a refined assessment by creating task-specific datasets for evaluation of whether content is faithful to the source.”

This applies in the reverse direction too: a piece of content is evaluated to verify whether it is “faithful” to what it claims to be — including the promise implicit in the title.

The third element is the very definition of hallucination in the language of models. Again, in the study by Minaee et al. (2025) we read: “hallucination in an LLM is characterized as the generation of content that is nonsensical or unfaithful to the provided source”.

A systematic mismatch between title and content is, in the researchers’ technical terminology, a pattern of infidelity to the declared source.

From research to penalty: the applied deduction

The three documented mechanisms — context relevance, faithfulness classification, hallucination detection — operate on the retrieved text. From this follows a deduction that isn’t documented as such in the literature, but that emerges directly from the architecture of the systems:

If a RAG system is trained to assess context relevance and filter out noise, and if a retrieved chunk has a title that doesn’t match the content, that chunk produces a low-relevance signal — regardless of the quality of the text inside it. The title is part of the chunk. It’s the first signal the model reads to assess whether the content is relevant to the query.

From this it follows that a systematic mismatch between title and content lowers the probability that your chunk gets selected — and consequently that your site gets cited in the answer.

This isn’t an explicit penalty in the algorithmic sense (like a Google update that demotes a site). It’s a structural effect: your pages get retrieved but then discarded during scoring, because the relevance signal is weak. The accumulation of this pattern across several pages of your domain progressively reduces the probability that your content gets used as a source.

This mechanism should be kept distinct from the truthfulness score, which measures the accuracy of claims, and from the BLEU/ROUGE score, which measures overlap with reference text. Citation accuracy operates at a different level: it verifies the internal consistency between the various components of the content — title, meta description, H1, body text.

Common mistake

“The secret to boosting your online sales” says nothing — and doesn’t correspond to any specific content.

What actually gets penalized

The problematic patterns aren’t limited to the title tag. Relevance assessment in a RAG system works on everything retrieved in the chunk, which typically includes the title, meta description and the first paragraphs of text.

The mismatches that produce the strongest noise signal:

Title-content mismatch: the title promises a specific topic, the content covers a different or more generic one. “Guide to electronic invoicing for freelancers” on a page that talks about management software in general.
Misleading meta description: the meta description previews content that doesn’t exist on the page. AI retrieves this text too as part of the context.
H1 inconsistent with the sections: the H1 declares one topic, the H2s cover another. The model interprets this inconsistency as a signal of low structural reliability.
Lead paragraph that doesn’t keep the title’s promise: if the first paragraph isn’t consistent with the title, the chunk fails the context relevance test.

The problem with clickbait — sensationalist titles that promise more than the content can deliver — is that it produces the most visible mismatch. But the mismatch can also be unintentional: the result of pages that were optimized for a keyword without verifying that the actual content matched that keyword.

Pro tip

Align the title, meta description, H1 and content of your priority pages: they must promise and deliver the same thing.

How to act

The intervention isn’t technical — it’s editorial. It requires verifying that every page keeps the promise it makes.

Alignment audit for priority pages. Take the 10 pages you want AI to cite and, for each one, read in sequence: title tag, meta description, H1, first paragraph, main H2s. Do they all say the same thing? Do they promise the same topic? If the title says “guide,” is the content a guide with sequential steps or is it an informational article? If the title says “for ecommerce,” do all the sections talk about ecommerce?

Every mismatch you find is a weak point in the context relevance assessment.

The chunk test. Imagine your title and your first 300 tokens get extracted as an independent chunk — without the rest of the page. Does that chunk on its own answer the query you optimized the title for? If the answer is no, the chunk fails the relevance test.

Rewrite clickbait titles with specificity. “The secret to boosting your online sales” says nothing — and doesn’t correspond to any specific content. “How to reduce cart abandonment with 4 changes to your checkout page” is specific, keeps the promise, and the content can actually deliver on it. The specificity of the title is the constraint that forces you to write relevant content.

H1-H2 consistency. Every H2 section must keep the promise of its heading. If an H2 says “How to optimize the product page for AI search,” the section must explain exactly how to do it — not talk about optimization in general. This applies to faithfulness assessment and the TruthfulQA benchmark too: inconsistent sections trigger unreliability signals.

Eliminate teaser meta descriptions. The meta description isn’t a creative space to tease curiosity — it’s a promise about the content. “Discover the secrets your competitors don’t want you to know” doesn’t correspond to any verifiable content. “How to analyze the AI visibility gap against your competitors: a step-by-step method with free tools” is a meta description that a RAG system can use to assess the chunk’s relevance.

How to check your current situation

For each priority page, run this check in less than 5 minutes:

Read only the title tag and write in one line what the page is about according to the title
Read only the first 300 tokens of the content and write in one line what it’s actually about
Compare the two lines: do they match?

If they don’t match, you have a mismatch. Then run the same check for the meta description and H1.

The goal is zero mismatches on the pages you want AI to use as a source. It isn’t an unreachable goal — it’s an editorial problem, not a technical one. It requires attention, not tools.

Align the title, meta description, H1 and content of your priority pages: they must promise and deliver the same thing. RAG systems assess consistency as a reliability signal — and clickbait, even unintentional, lowers the probability of being cited.