Entities and Knowledge Graph

Knowledge Graph Poisoning Prevention: how to protect your entity from false information that AI takes as true

Roberto Serra 25 June 2026·~8 min read

A few well-placed sentences on a handful of online sources are enough to make AI say false things about you — without you noticing until a client stumbles upon it. You don't need a defamation campaign: research shows that tiny amounts of modified text can predictably shift a model's answers about a company. If you work in a sector where trust is everything — professionals, law firms, certified manufacturers — the reputational risk is concrete and silent. Monitoring and protecting your identity before the damage enters the training cycle is possible and requires no technical skills.

The question is not whether someone attacks you on the web. It’s whether AI takes false information published about you as true. Why it happens, and how to protect yourself.

Let me explain the difference. A classic reputational attack is a poisonous post on a forum, a negative review, a critical article. You see it, you handle it, you respond publicly. Humans read it, and humans know that context exists, that there’s conflict, that there’s “hearsay”.

An AI model does not. When ChatGPT, Perplexity or Gemini build the representation of your company, they don’t read the web like a human. They ingest pages, extract entities and relationships, and reduce them to a graph of facts. If that graph ends up containing “Studio Rossi lost a malpractice lawsuit in 2023” — and it’s false — the sentence comes out in AI answers as a fact. Without quotation marks.

And here we’re talking about a very delicate topic for those who do my kind of work: law firms and professionals with public exposure.

What really happens in your entity’s graph

In the previous articles of this series I explained how Google’s Knowledge Graph and the internal graph of AI engines work on the same principles: they take pages, extract entities, disambiguate them and connect them to each other in a network of typed relationships (“Firm X” — is based in — “Trento”; “Firm X” — specialized in — “intellectual property”).

The problem is that every tile of this network is built starting from public text. If the public text contains false statements about your entity — inserted by disgruntled former employees, by unfair competitors, by automatically generated content, or simply by uncorrected errors — those statements risk entering the graph. And from there into AI answers.

In the research world this phenomenon has a precise name: Knowledge Graph Poisoning. In 2025 two works came out that document it rigorously.

The first is by Wen et al. (2025), from Fudan University in Shanghai, titled “A Few Words Can Distort Graphs: Knowledge Poisoning Attacks on Graph-based Retrieval-Augmented Generation of Large Language Models“. The title already says it all: a very few words are enough to distort the graph.

In the research world on these systems, the mechanism documented by Wen and colleagues is this: by modifying less than 0.06% of a text corpus — we’re talking about a few dozen words out of tens of thousands — you can predictably change the way the model answers specific questions about the target entity. All while preserving the fluency of the text, therefore without leaving obvious traces of manipulation.

From this follows a very concrete operational consequence for your business: you don’t need a massive defamation campaign to pollute your entity in AI answers. A very few well-placed mentions, on sources the crawler reads, are enough to shift the model’s perception.

The second work, Zhao et al. (2025) — “RAG Safety: Knowledge Poisoning Attacks to RAG” (Zhao et al 2025) — reaches compatible conclusions by studying RAG systems (Retrieval-Augmented Generation), the ones that Perplexity and Google AI Overview use to pull fresh content from the web and build the answer on the fly. Here too the principle is the same: the retrieved content is fed to the model as if it were truth, and if it contains false statements about your entity, the model repeats them.

Why this is a problem specific to your sector

A law firm in Trento specialized in intellectual property and corporate law is more exposed than average for three structural reasons.

First: your work revolves around trust. When an entrepreneur searches “best IP firm in Trento” on ChatGPT, even a single wrong mention (a lawsuit never lost, a client never let go, a specialization you don’t have) can shift the choice.

Second: the legal sector produces little signed official content. Very few firms have an active blog, a press room, indexed publications. This means your entity is reconstructed by AI starting from directories, chambers of commerce, a few local newspaper articles, maybe LinkedIn. The density of your primary sources is low, and the low density of official sources makes any alternative content more effective, whether true or false.

Third: automated content about professionals is proliferating. “Lawyer review” sites, semi-automatic aggregators, LLM-generated texts that fill low-quality subdomains. No one reads them. But AI crawlers do.

The thread I want you to hold from beginning to end: your visibility in AI answers depends on which facts the model considers true about you. If the set of facts is clean, you come out. If it’s polluted, you come out anyway — but with the wrong information.

Common mistake

No control over the “about us” page.

The operational monitoring you can set up

Here I’m not selling you the magic solution. I’m telling you what to do with free tools, knowing that serious analysis requires professional tools.

First level — alerts on the entity name. Set up Google Alerts on: the firm’s name, the partners’ names, common spelling variants. You receive every new mention via email.

Second level — structured sources. Open Wikidata and search for your firm. If an entry already exists, check every single statement (P-statement). If there are errors, you correct them directly, with a source. If it doesn’t exist, consider creating it with verifiable primary sources.

Third level — periodic self-audit on AI answers. Once a quarter open ChatGPT, Perplexity, Gemini, Claude and ask 5-6 questions a potential client would ask (“who is firm X in Trento”, “which cases has it handled”, “areas of specialization”, “founding partners”). Note down answers and cited sources. This is an entry-level check: it doesn’t replace continuous monitoring, but it gives you an honest snapshot of the current graph.

Fourth level — checking the structured signals on your site. Open Google’s Rich Results Test, paste the firm’s URL, verify that the `LegalService` or `Attorney` schema is present and that the fields on specializations, offices and professionals are all explicitly declared. Every missing field is a gap that the AI model fills with whatever it finds elsewhere — and “elsewhere” is exactly the territory where poisoning can settle in.

I warn you right away: these four levels are a first step. Real analysis, on a firm with multiple professionals and multiple areas of specialization, requires professional brand monitoring and competitor comparison tools, and above all requires doing it at a cadence that is sustainable over time. It’s not a magic factor: it’s constant operational oversight.

Pro tip

Set up Google Alerts on: the firm’s name, the partners’ names, common spelling variants.

The mistakes I see most often

In recent months, looking at professional firms and SMEs approaching the topic, the recurring patterns are always the same.

No control over the “about us” page. The page most read by AI crawlers, left with old bios, outdated roles, specializations not explicitly declared. The model fills the gaps with whatever it finds elsewhere.
Legal directories never updated. Profiles on portals like “lawyer-directory”, “lawyer-italia”, filled in once and never touched again. If there are errors there, AI takes them.
Untracked mentions on third-party blogs. A former collaborator writes a post on a sector forum citing you incorrectly. Three years later that sentence is one of the few long texts about your name. It enters the graph.
No control over AI-generated content. Sites with thousands of automatic profiles on “Italian lawyers”. You don’t know it, but your name is inside a profile built by a model that made up half the data.

What to do concretely this week

A list of 10-15 specific queries about your sector and your entity, tested on the four main AI engines. Save answers and sources.
Comparison with the 3-5 firms that AI cites as “the reference in Trento for IP and corporate law”. Look at what they have that you don’t: official pages, publications, conference talks, Wikidata entry.
For every false statement found: correct it at the source if it’s a channel of yours; ask the manager for removal if it’s third-party; dismantle the sentence with official content of your own that clearly says the opposite, indexed and linked.

The point of all this is not to become paranoid. It’s to realize that the AI model builds your reputation on data that lies outside your direct control — and that a minimum of oversight flips the situation. In the previous articles I talked to you about E-E-A-T for AI and Author Entity Recognition: they are the constructive layer of your visibility. Monitoring against poisoning is the defensive layer. You need both.

In the next articles of this series I go into the detail of the periodic entity audit, of healthcare legal finance entity compliance and of the competitor entity graph — all tiles that, put together, make you come out in AI answers with the right information next to your name.

Chapter 4 · Entities and Knowledge Graph

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

4.1 Entity Monitoring & Maintenance 8 deep dives

Entity Confidence Testing: reading the AI’s language to understand how much it trusts your brand Your brand exists in four parallel versions (and you don’t know it) Entity Decay: Why AI Stops Citing You (and How to Get Back Into Answers) Entity recovery after a reputation crisis: how to clean up your entity in the Knowledge Graph Periodic Entity Audit: Why Your Data Ages Even When You Don’t Notice AI Response Monitoring for Entities: how to find out if AI tells your brand’s story with the right data Knowledge Graph Poisoning Prevention: how to protect your entity from false information that AI takes as true You are here Training Data Lifecycle: why corrections to your site don’t reach the AI right away

4.2 Entity Recognition 8 deep dives

Entity disambiguation: when AI confuses your brand with a namesake Entity salience: why being named once is like never being named at all Your brand shows up in AI answers, but classified as what? Entity Linking: why 50 mentions of your brand are worth zero if the AI doesn’t connect them When the AI stops understanding that “we” means you: the coreference problem Are your brand in Italian and in English the same entity to AI? Probably not New brand invisible in AI answers: how to speed up recognition Named Entity Recognition: how AI decides whether your brand is “someone” or just text

4.3 Entity Relationships 8 deep dives

AI doesn’t cite you in isolation: it cites you within a network of relationships Competitor Entity Graph: why AI always cites the same 4-5 brands in your industry Entity-place association: why Perplexity knows who to recommend in Sardinia (and you maybe don’t) Industry Vertical Classification: the category that makes you visible (or invisible) in AI answers Supply Chain Entity Mapping: how partners tell AI who you are Speaking at events: why every conference is an authority node for AI Alumni & Affiliation Network: your institutional connections in the AI graph Client Portfolio as Entity Network: why your clients are nodes that strengthen you

4.4 Knowledge Graph Optimization 8 deep dives

Entering Google’s Knowledge Graph: why without it you’re just text to Gemini Wikidata as semantic backbone: the entry that makes your brand exist for AI Complete Organization schema: the machine-readable ID card of your brand sameAs: the glue that holds your identity together for AI Your brand’s tax code in the AI ecosystem When AI finds three different versions of your company, it stops recommending you Rich Entity Attributes: why AI cites only “fat” entities in detail Product Entity vs Brand Entity: why AI can know your name without knowing what you sell

4.5 Vertical & Local Entities 8 deep dives

Google Business Profile as the primary entity: why AI looks there before your website NAP Consistency: why AI sends clients to the wrong number Who is the Cagliari dentist according to ChatGPT? The answer depends on your city’s Knowledge Graph Why AI Recommends Generalist Blogs Instead of Your Medical Practice (and How to Turn It Around) Franchises and multi-location: why AI doesn’t add up the authority of your locations Professional Registry Entity: why the professional register is your proof of existence for AI Vertical industry directories: why AI pulls its recommendations from there Product/Service Schema for Transactional Queries

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →