Entities and Knowledge Graph

Wikidata as semantic backbone: the entry that makes your brand exist for AI

The brands that ChatGPT cites with confidence almost all share something: a completed entry on Wikidata, the large archive of structured knowledge that powers Google, Bing and many AI systems. Those who don't have one, or have an empty one, are noise the model prefers to ignore in order not to risk mistakes. It isn't a complex technical matter: six properties filled in the right way are enough to create an entry that models treat as a reliable source. It can be done in less than an hour.

The brands that ChatGPT cites with high confidence, when answering a vertical question, almost all have one thing in common: a Wikidata entry rich in attributes, with completed properties, website, headquarters, founding year, industry, people involved. The brands the AI never cites, on the other hand, often don’t have an entry at all. Or they have one but it’s empty, with three lines and no linked properties.

This is no coincidence. Wikidata is one of the structured sources that feed the Knowledge Graphs of Google, Bing and several modern AI systems. If your entry isn’t there, or is skeletal, you’re playing the AI visibility game without having filled in the registry form.

Let me explain what Wikidata really is for an AI model, why it sits upstream of almost everything I’ve described in this series, and how to create an entry that works.

What Wikidata is for an AI model

In the field of research on the relationship between language models and knowledge graphs, Wikidata holds a specific position. The survey by Cedric Möller et al. (2021) on entity linking over Wikidata describes it as a continuously updated, community-maintained, multilingual knowledge graph. The work by Wu et al. (2023) on the integration between LLMs and knowledge graphs places it among the encyclopedic graphs most used as a source of external knowledge for AI systems.

Be careful though, Wikidata is not Wikipedia: it isn’t a narrative encyclopedia, it’s a structured database. Every entity (you, your brand, a product, a person) has an identifier code (the Q-number) and a series of properties linked to verifiable values. Type of activity, website, headquarters, founder, founding year, industry, awards received, publications.

Translated into practice, when an AI system has to give an answer about a brand, a company, a professional, one of the routes it can take is linking to a known entity in a graph like Wikidata. If the entity isn’t there, the model relies only on what it read scattered across the text during training. Less reliable, less citable, less likely to appear in answers.

Why Wikidata sits upstream of everything else

In the previous articles in this series I explained how AI represents concepts in the vector space of embeddings and how it recognizes an author as an entity through author entity recognition. Wikidata is the layer beneath.

It’s the register where your brand stops being “a string of text that shows up here and there” and becomes an entity with a stable identity. With a code, with verifiable properties, with external links to authoritative sites, with multilingual translations.

The reason why all the work you do downstream (schema markup, E-E-A-T, well-structured content) pays off more when you have a completed Wikidata entry is simple: you give the AI system a disambiguated anchor point. “This brand here” instead of “maybe this brand, maybe another one with a similar name”.

The conclusion for you is straightforward: if your entity is well represented in that kind of graph, you’re usable material for the answer. If it isn’t, you’re noise the model prefers not to cite in order not to get it wrong.

Common mistake

If your entry isn’t there, or is skeletal, you’re playing the AI visibility game without having filled in the registry form.

The test you can run in ten minutes

Go to wikidata.org and search the exact name of your brand. Three possible outcomes:

  • No result: you don’t exist in the graph. Zero semantic anchoring for the AI.
  • Result with a bare entry: you’re there, but you only have a name and one or two properties. Little to cite.
  • Result with a rich entry: you have P31 (instance of), P856 (website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (industry) filled in. You’re citable.

The binary threshold to start from is simple: if you search for your brand and find nothing, or find an entry with fewer than 5 properties filled in, you’re below the minimum threshold. It isn’t the real analysis, which requires professional tools and control of linguistic aliases, but it’s an honest first step.

Then run the counter-check: open ChatGPT or Perplexity and ask “what do you know about [your brand name]?”. If the answer is generic, confused or makes things up, that’s a consistent signal. If it accurately names founding year, headquarters, industry, there’s a good anchor upstream.

Pro tip

Fill in at least these six properties: P31 (specific type, not generic), P856 (official website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (vertical industry).

A case I followed: a coffee roaster in Parma

An artisan coffee roaster in Parma who works with me had a situation typical of the food specialty scene of gastronomic Emilia: a well-made site, recognized product quality, presence in coffee industry magazines, but zero visibility in AI answers to queries like “artisan coffee roasters Emilia-Romagna” or “specialty coffee micro-roasters Italy”. Perplexity always cited the same three or four names, he never showed up.

Check on Wikidata: entry missing. No Q-number. On Google’s Knowledge Graph, same thing, no panel.

The intervention was precise: creating the Wikidata entry with P31 (enterprise), P856 (website), P159 (headquarters in Parma), P112 (founder), P571 (founding year), P452 (industry: coffee roasting), plus external links to reviews from the specialized press and to articles in food magazines. No magic, just careful compilation and verifiable external references.

After about five months, on a sample of 15 queries we test monthly on ChatGPT and Perplexity, the brand went from zero citations to appearing in 6 answers out of 15. An indicative test, not a study: small sample, no control group, and in the meantime we also worked on other fronts (content structure, schema on the site). But the pattern is consistent with what I see with other food specialty clients: the Wikidata entry isn’t enough on its own, it’s not a magic switch, but it moves the needle noticeably when downstream you have a well-built site.

The mistakes I see most often

Among small food producers in gastronomic Emilia, and not only there, these are the recurring patterns:

  • Entry created and abandoned. An entry with three properties and then no further updates. Wikidata is a living graph: if it doesn’t grow, it ages badly.
  • Wrong or generic P31. Putting “business” instead of “coffee roasting”, “winery”, “artisan pasta factory”. You lose the vertical classification, which is exactly what triggers the citation on industry queries.
  • No authoritative external link. An entry without references to third-party sources (press, trade associations, industry databases) is fragile. It can even be flagged for deletion by the community.
  • Self-referential description. Phrases like “the best artisan coffee roaster in northern Italy” get removed. Wikidata wants neutral facts: “specialty coffee roaster founded in [year] in [city]”.

Compare your entry (or its absence) with the 3-5 competitors the AI cites most often in your industry queries: often the difference is exactly here.

What to do concretely

  1. Search for your brand on wikidata.org. If it doesn’t exist, create the entry.
  2. Fill in at least these six properties: P31 (specific type, not generic), P856 (official website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (vertical industry).
  3. Add external identifiers where you have them: any trade registries, authoritative profiles.
  4. Insert at least 2-3 references to third-party sources (industry press, associations, recognized catalogs).
  5. After 2-3 months, retest the industry queries on ChatGPT and Perplexity and compare with the baseline.

The real analysis, with complete alias mapping, monitoring of missing properties compared to competitors and integration with schema markup on the site, requires professional tools and a more expert hand. This is the first step to start from.

Chapter 4 · Entities and Knowledge Graph

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

4.1 Entity Monitoring & Maintenance 8 deep dives
4.2 Entity Recognition 8 deep dives
4.3 Entity Relationships 8 deep dives
4.4 Knowledge Graph Optimization 8 deep dives
4.5 Vertical & Local Entities 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand