Entities and Knowledge Graph

Entity Linking: why 50 mentions of your brand are worth zero if the AI doesn’t connect them

Roberto Serra 25 June 2026·~9 min read

Your brand is cited in fifty places online — but with slightly different names: with the Ltd., without, with the city, as an acronym. To the AI these are fifty citations of fifty different entities, none with enough weight to be recommended: all that visibility cancels out instead of adding up. It's not a question of quantity, and there is a precise way to attach all those citations to the same profile, so that every mention pulls in the same direction.

You are TecnoImpianti Soluzioni Industriali, you build electrical control panels in Brescia. Over the last two years they’ve cited you in 47 places: an interview in an industry magazine, three client case studies, a dozen mentions in B2B directories, a few reviews, a couple of transcribed podcasts.

When a business owner asks ChatGPT “who builds custom electrical control panels in Brescia?” you don’t show up. The problem isn’t quantity. It’s that, to the AI, those 47 mentions don’t belong to the same subject: some say “TecnoImpianti”, others “Tecno Impianti Srl”, others “TecnoImpianti Brescia”. Each variant lives as a separate entity. None accumulates authority.

This is the problem of entity linking. If you don’t solve it, your visibility in AI answers stays stuck, no matter how many citations you gather.

What entity linking really does when a model reads a page

In the world of AI search research the term has a narrow definition. Yifan Ding et al. (2024), in their work on EntGPT, explain it clearly when introducing the experimental analysis:

To better understand the results entity linking provides to LLMs, we also conducted a brief error analysis on two of the QA datasets: ARC-C and OBQA.

Yifan Ding et al., 2024

Translated into practice: it measures what happens inside an AI model when entity linking works versus when it fails, using exam questions as a testbed. It’s no accident that it’s studied together with question answering: it’s the mechanism that decides whether the model retrieves the right thing to answer.

The consequence for your business is direct. When ChatGPT or Perplexity compose an answer about your field, they first identify the subjects named in the question, then link them to known entities in their knowledge graph, then pull out what they know about each one. If your brand isn’t linked to a recognized entity, everything written about you on the web becomes noise: present on the page, absent from the answer.

Why this piece sits upstream of everything else

In earlier articles in this series I explained how an AI engine turns words into vectors and how it recognizes your brand name in a sentence. Entity linking is the immediately following step: after recognizing that “Serra Agency” is a company name, the system has to decide which company it is among the hundreds that could be called that.

It’s the bridge between the text you write and Google’s Knowledge Graph or the proprietary graphs that OpenAI and Anthropic build internally. Without this bridge, all the work on E-E-A-T for AI doesn’t aggregate onto a single subject.

Which “anchor” do engines use most often? Cedric Möller et al. (2021), in a survey of English entity linking datasets, put it without mincing words:

Hence, Wikidata is an attractive basis for Entity Linking, which is evident by the recent increase in published papers.

Cedric Möller et al., 2021

In plain terms: Wikidata is the preferred basis for entity linking, and you can see it from the growing number of papers that use it. The reason is practical: Wikidata is community-updated, multilingual, and every entry has a stable numeric identifier (like Q42). For you, the business owner, this means something concrete: if a Wikidata entry for your brand exists and your site declares it explicitly, you have a strong anchor that AI engines can start from. If it doesn’t exist, you’re asking the model to guess.

Common mistake

If your brand isn’t linked to a recognized entity, everything written about you on the web becomes noise: present on the page, absent from the answer.

The test you can run in 15 minutes

This is an entry-level test. It gives you a clear signal, but the real analysis requires professional tools and someone who knows how to read them. With that caveat in place, here are the three steps.

First step, verify that an entity exists for your brand. Go to Wikidata and search for the exact name of your company. Three possible outcomes:

an entry with a Q identifier exists (e.g. Q1234567) → you have an anchor ready to go
no entry exists → you’ll have to create it or work on other entity signals (the site as anchor, Google Business Profile, Organization schema)
several ambiguous entries with similar names exist → disambiguation is your top-priority problem

Second step, check what your site says about itself. Open Google’s Rich Results Test, paste your homepage URL, and look in the response for the “Organization” block. If it’s there, verify that it contains the `sameAs` field with the link to your Wikidata entry and, where applicable, to your Google Business Profile. If `sameAs` is missing, your site is declaring who you are without giving the engine the thread to connect you to the global entity.

Third step, look at how others write your name. Run a Google search for your brand in quotes and open ten results other than your own. Count how many spell the name identically. If fewer than 7 out of 10 use the same form, you’re handing the AI engine ten versions of yourself to disambiguate by hand.

Pro tip

Fix the canonical form of your brand in a single way and use it everywhere you control: website, email, signatures, contracts, invoices, social media.

The case I took apart piece by piece

I asked Perplexity a vertical, non-trivial query: “best industrial valve manufacturers in Lombardy”. Seven sources came back cited in the answer panel — a mix of company sites, an industry directory, a B2B magazine. At that point I stopped and treated each of the seven cited companies as a clinical case, checking three things one by one: whether a Wikidata entry exists, the `sameAs` field in the homepage’s Organization schema, and the consistency of the name across the first ten external mentions.

The pattern emerged immediately. Six companies out of seven had a Wikidata entry and declared `sameAs` toward their official site, LinkedIn Company and — in four cases — toward their Wikipedia page. The seventh cited company, probably included because it appeared in an authoritative directory, had Organization schema without `sameAs` and no Wikidata entry.

Then I ran the cross-check: I took two manufacturers with revenue and seniority comparable to the seven cited ones, but absent from the answer. I knew them because they’re direct or indirect clients of colleagues in the field. Same product category, same territorial positioning, comparable production quality. Both had zero Wikidata entry, zero `sameAs`, and the name written in at least three different variants across the first ten mentions. They never appeared in the Perplexity answers, nor in similar queries reformulated on ChatGPT.

Read the test for what it is: a controlled observation on a single query and nine companies in total, not a study. But the mechanism that emerges is consistent with the one described in the papers: AI engines cite first and foremost the entities they can disambiguate with certainty, and discard those that live as fragmented identities. Anyone who does even the bare minimum — clean Organization schema, consistent name, declared `sameAs` — enters a pool where the competition is far lower than it seems.

The mistakes I see most often

A floating name in your own materials. The company is called “Automeccanica Brescia Srl” in the site footer, “Automeccanica Brescia” on the About Us page, “Auto Meccanica Brescia” on the Google Business listing and “AMB” in client case studies. Each variant, to an AI engine, is a different entity candidate. The citation work fragments across four subjects instead of adding up on one.

Absence of disambiguating context in the text. Writing “contact us, we’re at your disposal” doesn’t help the AI understand who you are. Writing “Studio Associato Rossi, accountants in Milan since 1987” gives the engine three traits — name, category, location, year — that let it distinguish you from the twenty Studio Rossi scattered across Italy.

Organization schema without `sameAs`. Many SMEs have an Organization JSON-LD on the homepage, but the `sameAs` field is empty or lists only social profiles. The `sameAs` exists to declare “I’m the same entity that on Wikidata is called Q…, that’s found here on LinkedIn Company, and here on Crunchbase”. Without those links, the engine reads an isolated identity.

Directories and mentions never reclaimed. If a client publishes a case study that cites you, and you don’t link to it from your site (and the client doesn’t link back to you with the correct name), that mention lives orphaned. It accumulates authority for the client, not for you.

What to do concretely, in order of priority

Fix the canonical form of your brand in a single way and use it everywhere you control: website, email, signatures, contracts, invoices, social media. Tolerate variants only where you can’t intervene.
On every company introduction page, write at least one sentence that includes name, product category, city. Not for SEO, but for disambiguation.
Verify (or add) the Organization schema on the homepage, with `sameAs` pointing to Wikidata if it exists, to Google Business Profile, to LinkedIn Company, to Crunchbase. You check it with Google’s Rich Results Test.
If Wikidata has no entry for your brand and you have at least 2-3 independent secondary sources citing you (industry press, interviews, authoritative directories), consider, together with someone who knows Wikidata, creating the entry. It’s not trivial and the guidelines are strict, but it’s the strongest entity signal you can give.
Compare yourself with the 3-5 competitors that ChatGPT and Perplexity cite when you run a generic query about your field. Check how many have a Wikidata entry, how many have `sameAs`, how many use the name consistently. If the majority do and you don’t, you know where to start.

The thread that holds it all together

Visibility in AI answers isn’t won on the individual page, it’s won on the entity. Entity linking is the mechanism that turns a set of scattered mentions into a single subject to which engines can attribute authority. Without this step, every piece of E-E-A-T you build stays suspended in mid-air.

In the following articles in this series I’ll dig into other levers of the same vein: how to enter Google’s Knowledge Graph step by step, how to handle reconciliation when you appear in different graphs with inconsistent attributes, and how to map the relationships between your entity and the others in your field to give the engine the context that sets you apart.

The mechanism, in the literature, is a work in progress: it’s not a magic factor, it isn’t enough on its own. But for a business that’s invisible as an entity today, the fundamentals are enough — consistent name, disambiguating context, declared `sameAs` — to rise out of the noise and start accumulating authority in a single place. Which is, in the end, the condition for showing up in AI answers when someone searches for what you do.