How AI engines think

Useful, accurate and safe: the 3 criteria AI uses to judge your content

AI doesn't evaluate your content in a generic way: it applies three precise filters in sequence — is the content useful? Is it accurate? Is it safe? If your text fails even one of these three tests, it gets systematically discarded in favor of those that pass them all. You could have the most in-depth article in your industry, but if it lacks a specific element that AI considers necessary, it never gets cited. Running a quick test on your most important pages can reveal problems you never would have suspected — and that are easy to fix.

AI doesn’t choose its sources at random. Even before analyzing your authority, your number of backlinks or the technical structure of your site, every language model carries with it a system of preferences built during training. That system is called RLHF — Reinforcement Learning from Human Feedback — and it has taught the model to recognize, almost instinctively, whether a piece of content is useful, accurate and safe.

This isn’t editorial advice. It’s the mechanism the model was built with.

If you want to understand why some brands consistently appear in AI answers and others don’t, you have to start here: with the architecture of training. Everything else — citations, structure, authority of sources — comes afterward.

How AI develops its preference for a piece of content

When a model like GPT-4, Claude or Gemini is developed, it goes through several phases. The first is pre-training on large amounts of text. The second, the one that determines the final behavior, is alignment — the process by which the model learns to respond in a way that is useful to people.

RLHF is the central mechanism of this alignment. As Ji et al. document in a 2025 systematic survey on the alignment of language models: “RLHF enables the incorporation of human preferences into model training by using a reward model to guide reinforcement learning optimization.” In practice, the model is not trained merely to predict the most probable text, but to generate answers that a human being would judge preferable.

The process works in three steps: the model generates multiple answers to the same question, human evaluators rank them from best to worst, and the model is retrained to produce answers similar to the preferred ones. This cycle is repeated thousands of times. The result is a model that has internalized human preferences to the point of applying them automatically to every answer it generates — including the one in which it decides whether or not to cite you.

As Ji et al. (2025) summarize: “RLHF uses a reward model to learn alignment from human feedback.” The reward model is the computational translation of those human preferences. It is the invisible judge that operates every time the model produces output.

The three criteria the model has learned to recognize

The human evaluators who guide RLHF are instructed according to precise guidelines. Those guidelines condense into three dimensions that the model has learned to recognize in every piece of content it processes.

Usefulness is the first. A piece of content is useful when it solves the user’s real problem, answers the question they asked, and provides something actionable. Being informative isn’t enough: the model has been trained to distinguish between text that “explains” and text that “enables”. If the reader finishes reading with a generic understanding but without knowing what to do, the content is classified as not very useful.

Accuracy is the second. Is the data verifiable? Do the claims have sources? Is there anything that could be false, exaggerated or undocumented? The model has been trained to recognize the signals of inaccuracy — statistics without a source, unsupported absolute claims, generalizations presented as facts. Content with these signals is systematically downgraded in the model’s internal ranking.

Safety is the third. This criterion goes beyond the obvious (harmful, violent, illegal content). The model is sensitive to anything that could be perceived as manipulative, deceptive or potentially harmful to the user. Aggressive sales techniques disguised as advice, excessive promises, content designed to create artificial anxiety — these signals trigger the model’s safety filters even before the user reads them.

The winning combination is practical + verifiable + honest. It’s not a matter of style. It’s the structure of training.

Common mistake

If the reader finishes reading with a generic understanding but without knowing what to do, the content is classified as not very useful.

Why I start here in the articles dedicated to training

With this article I’m opening a series of deep dives I’ve written to help you understand how AI models are built — and why this directly affects the chances of your brand being selected in answers. RLHF is the first piece because it’s the mechanism that translates human values into computational preferences.

In the next articles I’ll explain how Anthropic’s Constitutional AI takes this process to the next level (preferences derived not only from humans but from explicit principles), how pre-training data determines the domain in which the model is competent, how fine-tuning modifies the model’s behavior on specific tasks, and how deduplication affects which content actually gets “learned” during training.

But all these mechanisms operate on top of a foundation: the model has been trained to have preferences. And those preferences are called usefulness, accuracy and safety.

Pro tip

On the accuracy front: every piece of data, percentage or factual statement in your content must have a source.

The limit of classic RLHF and what it means for you

It’s also worth understanding the limits of the mechanism, because they affect the way the model behaves with complex content.

As Xu et al. observe in 2026 research: “Because traditional RLHF based on single-turn dialogues struggles to cover the complexity of real-world interactions.” The model has been trained mainly on single exchanges — one question, one answer. This works well for simple queries, but in complex or multi-step interactions, the preferences learned through RLHF can be less reliable.

For you, this has a concrete implication: the content that gets selected in AI answers is often the content that answers a single question in a clear and complete way. Content that presupposes prior context, that can only be understood by reading other articles in sequence, or that requires multi-step processing on the reader’s part — this content is structurally disadvantaged compared to the way the model has been trained to evaluate answers.

It doesn’t mean you have to simplify at all costs. It means every page must be self-contained: it must answer a specific question completely, even for someone who hasn’t read the rest of your site.

How to translate the three criteria into concrete actions

Taking these principles and applying them to your content requires a shift in perspective. Stop asking yourself “is this well written?” and start asking “does this pass the AI’s triple test?”

On the usefulness front: for each key page of your site, identify the specific action the reader should be able to take after reading. If the action is vague (“optimize the site”, “improve communication”), rewrite it in concrete, measurable terms. The model has been trained to distinguish generic advice from actionable advice, and it rewards the latter.

On the accuracy front: every piece of data, percentage or factual statement in your content must have a source. You don’t need an academic citation — “according to [source], [data]” is enough. If you don’t have a source for a piece of data, you have two options: find one or remove the data. There is no third path compatible with how the model evaluates reliability.

On the safety front: reread your content while actively looking for the signals that the model’s safety filters recognize. Excessive promises (“guaranteed”, “always”, “100% safe”), artificial urgency, unsupported claims presented as certainties, language designed to create pressure — these elements don’t just lower your reputation with human readers. They lower your score in the model’s preference system.

The triple test: an operational tool

Take the five most important pages of your site — the ones that drive the highest volume of traffic or leads — and put them through this test.

For each one, assign a score from 1 to 5 across three dimensions. Usefulness: after reading, does the reader know what to do? Are the actions specific and measurable? Accuracy: does every piece of data have a source? Is every claim verifiable? Is there anything potentially undocumented? Safety: is there anything that could be perceived as manipulative, exaggerated or deceptive?

By multiplying the three scores you get a value between 1 and 125. If the result is below 60, the model has a good chance of preferring your competitors’ sources when answering questions in your industry. Not because they’re better — but because their content aligns better with the preferences built during training.

From here on

RLHF is the base mechanism, but model training doesn’t stop here. In the next articles of this cluster you’ll explore how Anthropic extended the concept with Constitutional AI, how pre-training data determines the model’s domain of competence, how fine-tuning modifies behavior on specific tasks, and why deduplication affects which content actually gets learned.

But every time you ask yourself “why doesn’t AI cite me?”, the first answer to look for is here: does your content pass the triple test? Is it useful, accurate and safe according to the criteria the model was built with?

If the answer is no, no amount of backlinks or technical optimization will compensate. If the answer is yes, you have the foundation on which to build everything else.

The concrete action you can take today: open the most important page of your site and apply the triple test. Three honest questions and a score from 1 to 5 for each are all it takes. What you find is your real starting point.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives
1.2 Evaluation & Scoring 8 deep dives
1.3 LLM Architecture 8 deep dives
1.4 Retrieval & Grounding 7 deep dives
1.5 Training & Alignment 7 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand