Some of the techniques you use to rank on Google — repeating the same keywords in a forced way, building pages made only for search engines, using artificial urgency phrases in your calls to action — trigger the AI's safety filters and get you completely excluded from its answers. It's not a drop in ranking: it's a silent block, and you receive no warning. You could have been excluded from AI answers for months without knowing it, while your competitors gather customers who come from there. Running a site check to spot these problems takes less than an hour.
The SEO techniques that still hold up on Google — moderate keyword stuffing, geographic doorway pages, mild link schemes, artificial urgency in CTAs — trigger the AI safety filters. They don’t produce a ranking drop. They produce a silent exclusion: the model stops mentioning you and you receive no notification.
This is the last of my articles dedicated to the training of AI models. From RLHF to Constitutional AI, from pre-training data to fine-tuning: Safety Filtering is the operational layer where this entire architecture turns into a binary decision. Your content passes or it gets discarded.
Why AI safety filters are different from Google’s filters
Google penalizes. AI safety filters exclude. The difference is structural, and understanding it changes the way you look at your site.
When Google penalizes a page for keyword stuffing, it demotes it in the SERP. There are still ten results on the page. The user can scroll, compare, choose. The search engine shows — it does not recommend. The AI recommends. It generates a first-person answer that the user reads as personalized advice. Citing a spammy or manipulative site exposes the model to a reputational risk incomparably higher than including it in a list of ten results. This asymmetry of risk is the technical origin of safety filters that are far stricter than any Google algorithm.
The second element is the activation threshold. Safety filters don’t operate with a clean binary judgment — they operate on a continuum of probability. You don’t need a blatantly spammy page. It’s enough to send signals that consistently make the model prefer other sources over yours. And those signals include many of the practices that traditional SEO considers “acceptable”.
The mechanism: where safety filters operate
AI safety filters are not a single layer. They operate at least at two distinct moments in the process, with very different effects and recovery times.
In the training data: the data the model is pre-trained on is filtered before training begins.
“They may also produce toxic, offensive, or harmful content due to biases present in the training data.”
(Ji et al., 2025, A Survey on Progress in LLM Alignment from the Perspective of Reward Design)
The upstream filters exist precisely to block those patterns before the model memorizes them. If your site was in the dataset but triggered the filters, the information about you gets demoted. The model has “unlearned” who you are — an effect that is hard to reverse because it requires a new training cycle.
In real-time retrieval: RAG systems — like Perplexity, Bing Chat, ChatGPT with web search — retrieve content and filter it before passing it to the model. If a chunk of your site triggers a safety filter during retrieval, it gets discarded even if it was the most relevant for that query. This effect is faster to correct: remove the problematic patterns and the filter stops excluding you at the next indexing cycles.
There is a third vector, documented by recent research. Zhuang et al. (2025) analyze the vulnerabilities in content moderation systems and describe a relevant mechanism:
“Since both the inquiry and the response are executed within a sandbox, they bypass the content moderation system.”
(Zhuang et al., 2025, Exploring the Vulnerability of the Content Moderation Guardrail in Large Language Models via Intent Manipulation)
The architectural implication is direct: content moderation is not a single control point — it is distributed across multiple layers. A site that passes one filter can be blocked at another. Designing for safety filters means eliminating problematic patterns at every level, not optimizing just one.
Twenty identical pages with only the city changed — “SEO consultant Rome”, “SEO consultant Milan”, “SEO consultant Naples” — are a recognizable spam pattern.
The patterns that trigger the filter
From this mechanism follows a direct operational deduction: if safety filters are trained to recognize patterns associated with manipulative, spammy or low-quality content, any SEO technique that produces those patterns is a risk — regardless of the intention with which it is used.
Zhuang et al. (2025) document how text transformations are analyzed by the moderation system: “Given two types of text transformations, imperative transformation…” — the research shows that the system does not only analyze semantic meaning, but the formal structures of the text. Repetitive patterns, aggressive imperative constructions, the linguistic sequences characteristic of spam are recognized at a formal level, before the content is even evaluated on its merits.
This makes vulnerable practices that traditional SEO considers low-risk:
Moderate keyword stuffing. If your main keyword appears 12-15 times in 1,000 words, Google may tolerate it. AI safety filters recognize the repetition as a signal of content built to manipulate ranking, not to answer a real question. The threshold is far lower than Google’s.
Geographic doorway pages. Twenty identical pages with only the city changed — “SEO consultant Rome”, “SEO consultant Milan”, “SEO consultant Naples” — are a recognizable spam pattern. The model identifies them as content generated to rank, not to inform.
Link schemes in the footer and widgets. Footers with dozens of links, reciprocal blogrolls, widgets with commercial links. If the pattern is recognizable as an artificial scheme, the safety filter activates regardless of the quality of the content on the main pages.
Low-quality auto-generated content. Pages generated from templates with substituted variables. Safety filters are trained to recognize automatic generation patterns — not because AI content is intrinsically problematic, but because low-quality patterns generated at volume are one of the strongest signals of spam.
Aggressive CTAs. Artificial urgency, false scarcity, decorative countdowns, three or four calls-to-action on the same screen. The model is trained — through RLHF and Constitutional AI — to consider these patterns manipulative. Not because they necessarily are: because they resemble the patterns that are.
The good news: if you do clean SEO — original content with claims supported by sources, a clear semantic structure, honest CTAs — the safety filters don’t activate.
The result: a silent ban
The safety filter doesn’t notify you. You don’t get a visible penalty in Analytics, there’s no control panel that flags the problem. The AI simply stops mentioning you — in ChatGPT’s answers, in Perplexity’s citations, in Gemini’s suggestions. And the AI channel is becoming the first point of contact with the customers who don’t yet know you exist.
The asymmetry compared to Google is crucial. When Google penalizes you, the drop is measurable: you lose positions, traffic falls, you see it in Search Console. When a safety filter excludes you, you may not notice for months — you are measuring the traffic you have, not the traffic you never receive from AI answers.
The good news: if you do clean SEO — original content with claims supported by sources, a clear semantic structure, honest CTAs — the safety filters don’t activate. The filter penalizes shortcuts, not quality work.
How to check whether you’re at risk
The most direct test requires no special tools. Take the three most important pages of your site and paste them one at a time into ChatGPT or Claude. Ask: “Is this page a reliable source you would use to answer a question about [your topic]?” and “Are there any elements that could be perceived as manipulative by an AI moderation system?”
It’s not a definitive audit — you’re not seeing the internal scoring system. But the model’s qualitative perception is exactly what determines whether you get cited or excluded.
For a more structured check, use Screaming Frog or an equivalent tool and look for:
- Pages with keyword density above 3%
- URLs that follow identical patterns (e.g. `/service-city-1`, `/service-city-2`)
- Pages under 300 words with more CTAs than content
- Footers with more than 20-30 links
- Pages with hidden text or an unbalanced text-to-advertising ratio
For each “flagged” page: if I were a moderation system trained to protect users from manipulative content, would I classify this page as quality information or as a scheme to manipulate ranking?
If you have doubts, the safety filter won’t.
The circle closes here
From the RLHF that builds the model’s preferences, to the Constitutional AI that systematizes them into principles, from the pre-training data to fine-tuning: every phase of training contributes to the filters that operate when a source is evaluated. Safety Filtering is the convergence point of the entire pipeline.
The next articles shift perspective — I’m dedicating them to AI metrics. Instead of looking at how the model is built, you’ll look at how it measures the quality of the text. The first article is about the Perplexity Score: the metric the model uses to evaluate how “predictable” a text is relative to its training, and how this influences which content is deemed credible.
The concrete action you can take today: open the public policies of Anthropic and OpenAI, read the section on disallowed content, and go back to your site with that list in hand. You’re not trying to be ethically impeccable — you’re trying not to trigger patterns the filter has learned to recognize. The difference matters because it tells you exactly where to focus the work.