Chapter 4 of 7

Entities and Knowledge Graph

For LLMs, entities exist, not pages. Brands, products and people live — or don't — inside a knowledge graph.

A guide by Roberto Serra

sections

deep dives

~23 min

reading time

The AI doesn't browse your site the way a customer would: it builds an idea of who you are from what it finds on Wikipedia, in structured profiles, in external sources that are consistent with each other. If those traces aren't there or contradict one another, the model ignores you or confuses you with someone else — even if you have a perfect website. Fixing this presence is less complicated than it seems, and once it's done it becomes the foundation on which everything else works.

There’s a scenario I see repeat itself every month across a dozen different consulting sessions, always identical. An entrepreneur shows me his phone, opens ChatGPT and types “best [service] in [city]”. Then he looks at the screen with an expression halfway between resignation and disbelief, and tells me the same sentence: “See? It cites three competitors. One of them opened six months ago. I’ve been in this market for twelve years. Why them and not me?”.

The answer isn’t the one everyone expects. It’s not that your content is worse. It’s not that your site is slower. It’s not even a matter of backlinks or domain authority. It’s something more structural, more silent and, once you understand it, also more manageable: for AI engines you, as a distinct entity, don’t exist. Or you exist halfway, with confused information, without a clear position in the knowledge graph that the AI consults when it has to cite someone.

This is the thread that holds together everything you’ll find in this guide. Visibility in AI answers doesn’t come from web pages. It comes from entities. And a brand that doesn’t exist as a distinct, recognizable entity, connected to a precise semantic territory, simply doesn’t get cited. Or it gets cited badly, with the wrong name, the wrong line of business, confused with a namesake 400 kilometers away.

In my articles on how AI engines think, on authority and credibility, and on content structure, I dismantled the three previous levels of AI visibility: how the engine reasons, how you build trust, how you format content. Now the question shifts again: when the AI decides who to cite in your field, who is it really thinking about? A name? A domain? A concept?

Think of a node. Connected to other nodes. With precise attributes, registered synonyms, a clear category, documented relationships. That node is your entity. And visibility in AI answers runs almost entirely through it.

I’ve written 40 deep-dives to map every piece of this work. On this page I give you the complete map, organized into five sequential blocks that you can follow as a path or consult individually whenever you need.

What an entity is for the AI: the great paradigm shift

For years SEO taught you to think in terms of pages and keywords. You optimized title tags, you worked on backlinks, you built content clusters. That logic worked because Google, up to a certain point, rewarded pages. Then something changed, silently, inside search engines and inside generative models. The protagonist was no longer the page: it was the entity.

An entity, for an AI system, is a real-world object — a person, a company, a product, a place, an event — represented as a node in a graph. That node has a unique identifier, a category, a list of attributes and, above all, a network of relationships with other nodes. Your brand, for an AI engine, isn’t a website. It is (or should be) a node with a name, an industry, a location, products, customers, events it takes part in, and people who represent it.

A small example of scale, to nail down the shift. In 2010, if ChatGPT had existed, it would have learned mainly from texts: billions of pages, sentences, paragraphs. Today it still learns from texts, but on top of those texts it builds a structured representation made of nodes and relationships. When someone asks “who is the best tour operator specialized in Japan in Italy”, the model doesn’t scroll through a list of pages: it queries an internal graph, looks for nodes with the category “tour operator”, with the attribute “Japan specialization”, with the relationship “operates in Italy”, and pulls out the names.

Pro tip

Don’t work on all four graphs at once. Start with Wikidata: it’s read by almost every model and also serves as the basis for the others. Once the Wikidata entry is solid, the Google Knowledge Graph and the vertical graphs become easier to fix.

How ChatGPT sees you today

The models’ internal graphs form during training from co-occurrence patterns in texts: this means that what you publish today weighs on how you’ll be represented in future versions. It pays to take care early of consistency in name, category and relationships, because those signals settle in and become hard to correct after the fact.

The signals you publish today influence how you'll be cited tomorrow

The knowledge graphs that matter for your visibility are mainly four. The Google Knowledge Graph, fed by the entities recognized by Google and often shown in the panel on the right of search results. Wikidata, the structured encyclopedia connected to Wikipedia, used as a backbone by practically all language models. The internal semantic graphs of AI models, built during training from co-occurrence patterns in texts. And finally the vertical industry graphs — Crunchbase for startups, IMDb for cinema, PubMed for medicine — which feed specific answers in their domains.

The operational question isn’t “how do I optimize my page for ChatGPT”. It’s: does my brand exist in these graphs? With which attributes? Connected to what? And if it doesn’t exist, how do I get in?

This is where the five blocks of work begin.

The complete map: the five blocks of work

1. Getting recognized as an entity

Before thinking about how to appear in public graphs, you need to take a step back. There’s a more basic question: when the model reads a text that talks about you, does it understand that it’s you? Does it understand that your name is an entity and not a generic word? Does it understand which category to put you in? And if a namesake exists, can it tell you apart?

This is the level of recognition. It’s the most invisible level, because it happens inside the models when they process text, but it’s the first link in the whole chain. If it breaks here, everything else is useless.

The technical mechanism is called Named Entity Recognition. When the AI reads “Mondadori has published”, it recognizes “Mondadori” as an entity of type “Organization”. When it reads “the Mondadori of Italian cooking”, probably not, because the context is different and the name isn’t in canonical position. If in the texts that talk about you your name appears ambiguously, written in twenty different ways, without clear context signals, the engine doesn’t recognize you as an entity. I wrote about this in detail in the article on how Named Entity Recognition works and why, without NER, your brand is generic text.

Then there’s the problem of overlapping names. If your name is “Orsini” and in your field there’s already a law firm “Orsini” in Rome with 80 years of history, the AI has a problem: which of the two is the one the sentence it’s processing refers to? Without disambiguation signals — city, VAT number, website, specialization — your name becomes noise. I dedicated to this mechanism the deep-dive on how without disambiguation your name becomes noise and the AI always cites the oldest one.

Then there’s the weight your entity has within a single text. Not all names cited in a page are equally important to the model. Some are the main subject, others are background. The AI measures this difference and calls it salience. If your brand appears in an article but in the role of an extra — one name among thirty — it isn’t memorized as relevant to that topic. In the article on how to increase entity salience in the texts that talk about you I explain how to measure and improve this weight.

Being recognized isn’t enough: you need to be recognized in the right category. A specialty coffee e-commerce in Trieste doesn’t want to be classified generically as a “shop”: it wants to be in the category “specialized retailer of quality coffee”. If the AI puts you in the wrong category, you end up outside all the queries that look for your right category. I talk about it in how Entity Type Classification decides in which industry the AI makes you appear.

And finally the hook to the graph. Recognized, disambiguated, classified: the final step is connecting the node “your brand” to a node that already exists in the public knowledge graphs. This process is called Entity Linking and turns a textual mention into a canonical node. Without this hook, for the AI there are many small “your brands” scattered across texts, but no unified node to connect everything to. The deep-dive on how Entity Linking unifies all your mentions into a single node completes the picture.

At this point, the question to ask yourself is simple: if an AI engine tries to recognize my brand in an article that talks about me, does it have enough signals to understand that I’m an entity, of which category, and which node of the graph to hook me to? If the answer is no, this is where you need to start.

2. Building your voice in the knowledge graph

Once the recognition level is passed, the work becomes much more concrete. Your brand has to become a real node, present, visible inside the knowledge graphs that matter. Here you move from the linguistic to the structured, from the theory of recognition to the practice of insertion.

The most visible knowledge graph in the Western world is the Google Knowledge Graph. It’s the one that feeds the information panel that appears on the right when you search for a brand on Google: name, logo, description, location, links to social profiles. If that panel doesn’t exist for you, or exists with wrong information, any model that builds an answer using Google as a source — and there are many — suffers from it. I wrote in detail about how to get a Google Knowledge Graph entry and what to do when the one that’s there is full of errors.

But if the Google KG is the storefront, the real backbone is Wikidata. It’s an open, structured database, connected to Wikipedia, that is read by all the main language models during training and then consulted as a reference. Having a well-made Wikidata entry — with QID, typed properties, sitelinks — is the strongest signal you can give to be treated as a first-class entity. In the article on Wikidata as a semantic backbone for your AI visibility I explain the steps to get in and what to fill in before everything else.

On your site, the main tool is the Schema markup of type Organization. It’s the way you declare to the engine, in a language it understands without ambiguity, who you are, what you do, where you are, how you’re connected to the rest of the world. Many sites have a minimal version of it, often generated automatically by SEO plugins, which covers less than ten percent of the useful fields. In how to build a complete Schema Organization that the AI reads as a real entity I show you which properties are really weighed and which are pure noise.

Inside the Schema Organization there’s one property that’s worth, on its own, half the work: `sameAs`. It’s the list of links to your external profiles — LinkedIn, Crunchbase, Wikidata, Wikipedia, industry profiles — declared as “this one here, and that one there, and that other one, are all the same entity”. It’s the glue that holds together your distributed identity. I talk about it in sameAs: the glue that tells the AI it’s always you.

And then there’s the matter of richness. Two brands can both have a Wikidata entry and a Schema Organization on their site, but one has five attributes filled in and the other forty. The second gets extracted ten times more, simply because it has more handholds. The AI rewards informational density. The deep-dive on how Rich Entity Attributes multiply citations shows you where the fast lanes are to add attributes without redoing everything.

At this point, the question to ask yourself is: if an AI engine wanted to describe my brand in thirty seconds, would it have structured data to start from, or would it have to reconstruct everything from the flowing text of my about-us pages? The difference, in terms of citations, is enormous.

3. Your relationships make your authority

An isolated node, however well filled in, doesn’t build authority. The value of an entity inside a knowledge graph depends on the connections it has with other entities. Who you know, who knows you, which events you appeared at, which clients you worked with, which alumni network you belong to: every relationship is an edge in the graph and every edge is a signal the model reads to establish your place in the industry.

It’s the part that, for those coming from classic SEO, seems strangest. There you worked on links between sites. Here you work on relationships between entities. They’re different things. A link from the Sole 24 Ore website to yours is useful for many reasons; but a structured relationship between the entity “[your company name]” and the entity “Sole 24 Ore” — declared and repeated across different sources, in a form the model can parse — is yet another thing, and it weighs inside the graph in a specific way.

The first job is mapping the relationships you have and representing them so that the engine sees them. Partners, main clients, suppliers, certifications, trade associations: each one is a potential edge. The article on Entity-to-Entity Relationship Mapping: how to draw the network the AI will use to cite you guides you step by step in building this map.

Then there’s the competitive cluster. In your field, the AI has already mentally drawn a group of brands it considers peers. Maybe three tour operators specialized in Japan, or five charming boutique hotels on Lake Como, or four accounting firms specialized in startups in Milan. That group exists in the graph — even if no one drew it explicitly — and the work is to figure out whether you’re inside or outside, and if outside, how to get in. In the deep-dive on Competitor Entity Graph: how to discover which cluster the AI has put you in I show how to map your proximity to competitors and use it as leverage.

Events are one of the most underrated sources of relational authority. Taking part as a speaker at an industry event, if documented the right way, creates an edge in the graph between your “person” entity and the “event” entity. Repeated across three, four events, it becomes a signal of competence that the model reads clearly. I talk about it in Event Entity: how speaking engagements become proof of authority for the AI.

There’s a network almost no one leverages: that of alumni. If you studied at a well-known university, or if you worked at a company recognizable in the industry, that connection is an edge the model already knows and that positions you in the graph even before talking about your current project. The article on Alumni Affiliation Network: how your academic and professional network increases AI authority shows you how to declare these relationships in a structured way.

And then there’s the client portfolio. If your clients are entities recognizable in their fields, making them visible as edges of your graph — with public case studies, structured references, consistent communications — transfers part of their authority onto yours. In the deep-dive on Client Portfolio as an Entity Network I explain how to build this transfer without violating NDAs.

At this point, the question to ask yourself is: what are the five strongest edges of my relationship network, and are they visible as structured data or do they only exist as text scattered around my site? If the answer is “only text”, you’ve got your work cut out for you.

4. Existing in the right territory

The entity and relational dimensions live in the abstract. But most of the brands I have in consulting live in the concrete: they have a location, a neighborhood, local customers, industry regulations. When the AI builds an answer like “best [service] near me” or “accountant specialized in pharmaceutical companies in Bologna”, it’s querying entities that have a precise territorial or vertical dimension.

Here the work changes slightly: it’s not just building the entity, it’s anchoring it to the right territory and vertical.

The first asset is the Google Business Profile. For AI engines that use Google as a source — and that’s almost all of them, in one way or another — the GBP is your brand’s primary local entity. It’s not a “secondary profile” to fill in if there’s time left over: it’s the node many models query before others to build local answers. I dedicated a deep-dive to Google Business Profile as a primary entity for local AI visibility, with the list of fields that weigh the most.

Common mistake

Writing the address in different forms on your site, Google Maps and social media. Variants like Via Mazzini 12, Via G. Mazzini 12 and V.le Mazzini 12A are read as three separate entities and split up your local presence. Align name, address and phone character by character across all platforms.

Alongside the GBP there’s a topic as old as it is still decisive: NAP consistency. Name, address, phone identical everywhere. If on your site you write “Via Mazzini 12”, on Google Maps “Via G. Mazzini 12” and on Facebook “V.le Mazzini 12A”, for the AI you’re talking about three different entities. In the article on NAP Consistency: why small variations in your address split you up as an entity I explain how to do a quick audit and where consistency almost always breaks.

Some industries have extra rules. If you work in healthcare, legal or finance, AI engines apply compliance filters — the same ones Google calls YMYL — before citing you in an answer. Your entity needs to declare certifications, registrations with professional bodies, authorizations, scientific directors. I talk about it in Healthcare, Legal and Finance: the entity compliance the AI demands in order to cite you.

And if you have multiple locations — a franchise, a chain of professional firms, a network of boutique hotels — management gets more complicated. Each location is an entity in its own right, connected to the head office as a parent node. Managing this structure badly means disappearing in half the cities where you’re present. The deep-dive on Franchise and Multi-Location Entity: how to structure the parent entity and the child entities guides you through the correct topology.

At this point, the question to ask yourself is: if a customer in my field asks the AI for the best supplier in my city, is my brand recognized as a local entity with the right credentials and without inconsistencies across platforms? If you have doubts even just about NAP, that’s the first check to do tomorrow morning.

5. Maintaining your entity over time

The last part of the work is the one no one thinks they have to do, and it’s the reason why so many brands lose visibility after doing everything right for months. The entity isn’t a monument you build and leave there. It’s a living thing that ages, that can be attacked, that changes when you change, that requires periodic maintenance.

Treating it as a one-off project is the surest way to find yourself, a year from now, with outdated information, broken links, stale Wikidata entries and a Google Business Profile with opening hours from three years ago.

Pro tip

Schedule the audit in your calendar on a six-month cadence, not by gut feeling. Check NAP consistency, presence in public graphs, sameAs and Schema Organization: these are the points that degrade first when no one is watching them.

The first tool is the periodic audit. Every six months, a checklist of thirty points — from NAP consistency to presence in public graphs, from sameAs to Schema Organization — to see what has degraded and what needs reinforcing. The article on Periodic Entity Audit: the six-month checklist that keeps your AI presence alive gives you a replicable structure you can apply on your own or with a consultant.

Then there’s the monitoring of AI answers. How are ChatGPT, Perplexity, Gemini and the others actually citing you? With which attributes? With which errors? If you don’t measure, you don’t know whether the work is paying off. In the deep-dive on AI Response Monitoring for Entity: how to verify whether AI engines cite you well I show a replicable testing method with sample queries on the client’s industry.

There’s a more unpleasant problem that’s rarely discussed: knowledge graph poisoning. Malicious competitors, coordinated fake reviews, hostile edits to Wikipedia or Wikidata pages that concern you. The goal is to contaminate your graph with false information that the models then absorb. In the article on Knowledge Graph Poisoning Prevention: how to defend your entity from contamination I show the warning signs and the official reporting channels.

Even without attacks, there’s a physiological phenomenon: entity decay. If a brand stops feeding its own graph — no new content, no new mentions, no updates — the AI perceives it as “dormant” and tends to cite it less, replacing it with more active competitors. I talk about it in Entity Decay and Refresh Strategy: why your entity loses weight if you stop feeding it.

And finally, the extreme case: the reputational crisis. A brand that has suffered a public crisis — a news event, a class action, a founder involved in something unpleasant — sees its entity shift in the graph in undesirable directions. Bringing it back on track requires methodical reconstruction work. The deep-dive on Entity Recovery after a Reputational Crisis explains the phases of this recovery, with realistic timelines.

At this point, the question to ask yourself is: when was the last time someone really looked at how the AI is describing you? If the answer is “never” or “when we built the site”, you’re already past the decay threshold.

Operational audit: 10 steps to start today

The temptation, faced with such broad work, is to wait until you have everything clear before moving. Wrong. Work on entities is done in short, verifiable, progressive steps. Here are the ten steps to start with this afternoon, all achievable without a consulting budget and with public, free tools alone.

Common mistake

Taking the information panel for granted without verifying it. Many brands haven’t checked it in years and only find out afterward that it shows a closed location or a wrong description. Do the search in incognito mode and read every field of the panel, because it’s one of the sources AI engines use to describe you.

Search for your brand on Google from desktop, in incognito mode. Does an information panel appear on the right? If so, is it correct in every detail? If not, you already have your first priority intervention.
Open Wikidata.org and search for your brand in the search bar. Does an entry exist? If so, check QID, properties and sitelinks. If not, that’s the second big open construction site.
Open Google Search Console and go to the performance report: the queries you appear on today tell you which thematic cluster Google sees you in. If you don’t recognize your positioning, the AI is probably in worse shape.
Run your home page through Google’s Rich Results Test. The tool shows you whether you have a valid Schema Organization, how many properties you’ve filled in and which errors are present.
Check the sameAs in your Schema Organization. If they’re missing or fewer than five — LinkedIn, Crunchbase, Wikidata, Wikipedia, vertical industry profiles — it’s one of the first things to fix.
Open your Google Business Profile and compare each field with the one on your site. Name, address, phone, hours, description, categories: even a comma out of place counts.
Try displaCy ENT, paste a public article that talks about your brand and see whether your name is recognized as an entity and in which category. If it’s labeled as “PERSON” when you’re a company, or ignored entirely, you have a recognition problem upstream.
Run three queries on the main AI engines in your typical client’s industry — not queries about your brand, but queries the way a prospect would make them: “best [service] in [city]”, “who is the top expert in [sub-topic] in Italy”, “who to ask about [specific problem]”. Note the names cited. Are you in? Are you out?
Map the five competitors the AI cites in your place. Open them one by one, look at the Google panel, search for the Wikidata entry, analyze their Schema. Are they doing something you’re not? Almost always yes, and almost always it shows.
Write a three-month roadmap with at most five interventions in order of priority. The most common trap in this work is wanting to do everything at once. Five things, done well, in the right three months, move more than twenty things started and left half-finished.

These ten steps are a serious first step, but they remain a first step. Systematic work on the entity — especially in the relationships and maintenance sections — requires professional tools, a measurement methodology across multiple AI engines and an editorial oversight that goes beyond the single brand. What you find in this guide is the map. The path, if you really want to take it, is built calmly.

The thread that holds it all together

Visibility in AI answers is a four-story building. On the ground floor are the engines — how they think, how they reason, how they retrieve content. On the first floor is trust — how the models decide who to trust and which signals they rely on. On the second floor is content structure — how to format pages so the AI knows how to read and extract them. And on the third floor is the work on the entity, which is where you are now.

None of the previous floors work well without this one. You can have the clearest content in the world, the most solid reputation, the finest understanding of how engines work: if as an entity you don’t exist, or you exist badly, your citations stay below potential. You can build an impeccable entity, but if the content isn’t extractable, the engines have no material to cite in the paragraphs of their answers. It’s a system, and like all systems it demands consistency between the levels.

The thread is always the same: visibility in AI answers doesn’t come by chance and can’t be bought. It’s built by feeding the right signals, in the right places, with the consistency the engine knows how to recognize. The entity is the level where those signals take a precise form, become queryable, become citable.

If today you hear your name pronounced by an AI engine less than you deserve, it’s not bad luck. It’s a piece of the system that isn’t running. Find it, fix it, move on to the next one. One thing at a time.

Chapter 4 · Entities and Knowledge Graph

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

4.1 Entity Monitoring & Maintenance 8 deep dives

Entity Confidence Testing: reading the AI’s language to understand how much it trusts your brand Your brand exists in four parallel versions (and you don’t know it) Entity Decay: Why AI Stops Citing You (and How to Get Back Into Answers) Entity recovery after a reputation crisis: how to clean up your entity in the Knowledge Graph Periodic Entity Audit: Why Your Data Ages Even When You Don’t Notice AI Response Monitoring for Entities: how to find out if AI tells your brand’s story with the right data Knowledge Graph Poisoning Prevention: how to protect your entity from false information that AI takes as true Training Data Lifecycle: why corrections to your site don’t reach the AI right away

4.2 Entity Recognition 8 deep dives

Entity disambiguation: when AI confuses your brand with a namesake Entity salience: why being named once is like never being named at all Your brand shows up in AI answers, but classified as what? Entity Linking: why 50 mentions of your brand are worth zero if the AI doesn’t connect them When the AI stops understanding that “we” means you: the coreference problem Are your brand in Italian and in English the same entity to AI? Probably not New brand invisible in AI answers: how to speed up recognition Named Entity Recognition: how AI decides whether your brand is “someone” or just text

4.3 Entity Relationships 8 deep dives

AI doesn’t cite you in isolation: it cites you within a network of relationships Competitor Entity Graph: why AI always cites the same 4-5 brands in your industry Entity-place association: why Perplexity knows who to recommend in Sardinia (and you maybe don’t) Industry Vertical Classification: the category that makes you visible (or invisible) in AI answers Supply Chain Entity Mapping: how partners tell AI who you are Speaking at events: why every conference is an authority node for AI Alumni & Affiliation Network: your institutional connections in the AI graph Client Portfolio as Entity Network: why your clients are nodes that strengthen you

4.4 Knowledge Graph Optimization 8 deep dives

Entering Google’s Knowledge Graph: why without it you’re just text to Gemini Wikidata as semantic backbone: the entry that makes your brand exist for AI Complete Organization schema: the machine-readable ID card of your brand sameAs: the glue that holds your identity together for AI Your brand’s tax code in the AI ecosystem When AI finds three different versions of your company, it stops recommending you Rich Entity Attributes: why AI cites only “fat” entities in detail Product Entity vs Brand Entity: why AI can know your name without knowing what you sell

4.5 Vertical & Local Entities 8 deep dives

Google Business Profile as the primary entity: why AI looks there before your website NAP Consistency: why AI sends clients to the wrong number Who is the Cagliari dentist according to ChatGPT? The answer depends on your city’s Knowledge Graph Why AI Recommends Generalist Blogs Instead of Your Medical Practice (and How to Turn It Around) Franchises and multi-location: why AI doesn’t add up the authority of your locations Professional Registry Entity: why the professional register is your proof of existence for AI Vertical industry directories: why AI pulls its recommendations from there Product/Service Schema for Transactional Queries

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →