How AI engines think

The AI’s Internal Filters Can Block Your Site Without Warning

Roberto Serra 25 June 2026·~8 min read

Your site could be silently blocked by the AI's internal filters — not because you did anything wrong, but because certain elements in your text are automatically interpreted as manipulative or exaggerated content. You receive no warning, you see no drop in ranking: you simply never appear in the answers, and you don't know why. This silent block can last for months while your competitors collect the customers looking for exactly what you do. Checking whether you are already filtered and fixing the problem is easier than it seems.

Your site is online, indexed, with good content. And yet the AI never mentions you. It’s not an authority problem, it’s not a technical structure problem — it could be that the model’s internal safety filters are blocking your site right now. And you won’t receive any notification.

AI models have an internal “constitution” — a system of ethical rules that automatically filter out content perceived as manipulative, exaggerated, or low quality. If your site triggers those filters, you’re out of the answers. Not out of Google, not out of Bing — out of the AI answers. And the difference is enormous, because the AI channel is becoming the first point of contact with customers who don’t yet know you exist.

How an AI model’s “constitution” works

In 2022 Anthropic published the paper that changed the way models are trained for safety. The core principle is simple: instead of having human moderators evaluate every single answer, the model is trained by following a list of principles — a constitution, precisely.

As Bai et al. write in the original paper:

“The only human oversight is provided through a list of rules or principles, and the AI model is trained to follow these principles.”
(Bai et al., 2022, Constitutional AI: Harmlessness from AI Feedback)

This has direct consequences for your site’s visibility. The constitution is not just a filter on what the model can say — it’s a filter on which sources the model considers reliable enough to cite. A site that produces content in conflict with the model’s constitutional principles is systematically excluded from the answers, even when its content is technically relevant to the query.

The point that most marketing professionals ignore is this: Constitutional AI does not distinguish between “you’re trying to do harm” and “you’re using patterns that resemble harmful content.” The filter reacts to signals, not to intentions.

The technical mechanism behind the filter

To understand why this concerns you, you need to understand how these principles are translated into the model’s behavior.

The process is called RLAIF — Reinforcement Learning from AI Feedback.

“RLAIF (reinforcement learning from AI feedback) is a popular approach. Reinforcement learning from AI feedback directly connects a pretrained and well-aligned model to the LLM.”
(Minaee et al., 2025, Large Language Models: A Survey)

In practice this means: during the model’s fine-tuning, an already-aligned AI model evaluates the responses of the model being trained. If a response — or the source it’s based on — violates the constitution’s principles, it is penalized. The model learns, iteration after iteration, to avoid those sources.

The most recent research has documented how this process is applied with surgical precision.

“Bai et al. implemented a hybrid reward modeling framework by applying rule-based constitutional principles to remove unsafe responses before collecting AI feedback.”
(Ji et al., 2025, A Survey on Progress in LLM Alignment from the Perspective of Reward Design)

The practical result is that the constitutional rules act as an upstream filter: before the evaluating AI model even sees the response, content that violates the principles has already been removed. This makes the system much more robust — and much less forgiving toward sites that produce borderline content.

If you’ve worked with RLHF and training for human preferences, you already understand the underlying logic: the model learns to reward certain patterns and to penalize others. Constitutional AI takes this mechanism to a higher level, systematizing which patterns get penalized and why.

Common mistake

Your site doesn’t actually have to manipulate anyone — it’s enough that it uses patterns the model associates with manipulation.

What actually triggers the filters

Here we get to the part that changes the way you look at your site. The constitutional filters are not a theoretical abstraction — they have precise targets.

Manipulative content is the first at-risk category. Your site doesn’t actually have to manipulate anyone — it’s enough that it uses patterns the model associates with manipulation. Artificial urgency (“today only”, “last 3 spots”, “offer expiring in 2 hours”), unverifiable social proof (“thousands of satisfied customers” with no concrete data), pressure for a quick decision. These patterns were designed by traditional marketing to convert — and they are exactly what the AI constitution is engineered to exclude.

Exaggerated or unsupported content is the second category. The model evaluates whether claims are backed by verifiable data. A site that promises guaranteed results without citing sources, that uses invented or unattributable statistics, that makes claims about results without evidence — is perceived as unreliable. And an unreliable site does not get cited, regardless of its SEO authority.

Content with low perceived quality rounds out the picture. Keyword stuffing, thin pages with little substance and many CTAs, low-quality auto-generated content, doorway pages — all patterns the model has learned to associate with sites that don’t deserve to be recommended to a user.

The problem is the threshold. The filters don’t operate with a clean binary judgment — they operate on a continuum of probability. You don’t need a blatantly spammy page to be filtered. It’s enough to have sufficient negative signals to push the model to systematically prefer other sources when it answers.

Pro tip

The action you can take today is concrete: open the public usage policy guidelines from Anthropic and OpenAI, read the section on prohibited content, and go back to your site with that list in hand.

The two levels where the filter hits you

Constitutional AI operates at two distinct moments of the answer process, and understanding the difference tells you where to intervene first.

In the training data: the pre-training data the model is initially built on is processed with quality filters. If your site was in the original dataset but triggered the filters, the information about you gets demoted — the model has “unlearned” who you are. This effect is harder to reverse in the short term, because it requires a new training cycle.

In real-time retrieval: systems that use RAG (like Perplexity or Bing Chat) retrieve content in real time and filter it before passing it to the model. If a chunk of your site triggers a safety filter during retrieval, it gets discarded even if it was the most relevant for that specific query. This effect is faster to fix — the moment you remove the problematic patterns, the filter stops discarding you.

The result in both cases is a silent ban: no notification, no visible penalty in Analytics, no control panel. The AI simply stops mentioning you, and you don’t know why.

It’s stricter than Google’s filters for a structural reason: the AI generates answers in the first person that the user perceives as personalized advice. Recommending a manipulative site exposes the model to an incomparably higher reputational risk than a search engine that shows ten results and leaves the choice to the user. Constitutional AI is the technical response to that risk — and your site is caught in the middle.

How to check whether you’re filtered

Before intervening, you need to figure out whether the problem exists. The most direct test requires no special tools.

Take three of your most important pages — a services page, an article page, a homepage — and paste them one at a time into ChatGPT.
Ask: “Is this page a reliable source that you would use to recommend [your service or topic] to a user who asks you about it?”
If the model raises doubts about the quality, about the verifiability of the claims, about the presence of excessive persuasive elements — you have a direct signal.
Add a second question: “Are there elements in this page that could be perceived as manipulative or low quality by an AI system?”

It’s not a definitive test because ChatGPT isn’t showing you its internal scoring system. But the qualitative answer tells you how the model perceives your content — and that perception is exactly what determines whether you get cited or excluded.

The deduplication process adds a further layer: if you produce content that resembles hundreds of other pages already in the training data, the filter has even fewer reasons to include you. Uniqueness and constitutional filters add up.

The action plan

Once you have the picture, the intervention follows a precise logic.

Audit the at-risk patterns. Search your site for these signals:

Artificial urgency: fake deadlines, invented limited quantities, decorative countdowns
Unverifiable social proof: “thousands of customers”, “industry leader”, testimonials with no concrete data
Result promises: “we guarantee X”, “get Y in Z days” with no evidence
Sourceless statistics: any number that can’t be attributed to a real source
Keyword stuffing: the same keyword repeated multiple times on the same page with no informational function
CTA stack: three or four aggressive calls-to-action on the same screen

Evidence-oriented rewriting. You don’t have to eliminate the CTAs — you have to eliminate the patterns the filter recognizes as manipulative. “Book a consultation to assess your situation” is neutral and works. “BOOK NOW — ONLY 3 SPOTS LEFT — EXCLUSIVE OFFER RESERVED FOR THE FIRST 10 SIGN-UPS” triggers the filters on three different dimensions at once.

Verify after the cleanup. For RAG systems the effect is relatively fast — within weeks you can start monitoring whether your site appears in Perplexity’s answers on relevant queries. For the training data the cycle is longer, but the starting point is the same: a clean site is a prerequisite, not a guarantee.

The action you can take today is concrete: open the public usage policy guidelines from Anthropic and OpenAI, read the section on prohibited content, and go back to your site with that list in hand. You’re not trying to be ethically irreproachable — you’re trying to avoid triggering patterns the filter has learned to recognize. The difference matters because it tells you where to focus the work.