The brands that ChatGPT cites with confidence almost all share something: a completed entry on Wikidata, the large archive of structured knowledge that powers Google, Bing and many AI systems. Those who don't have one, or have an empty one, are noise the model prefers to ignore in order not to risk mistakes. It isn't a complex technical matter: six properties filled in the right way are enough to create an entry that models treat as a reliable source. It can be done in less than an hour.
The brands that ChatGPT cites with high confidence, when answering a vertical question, almost all have one thing in common: a Wikidata entry rich in attributes, with completed properties, website, headquarters, founding year, industry, people involved. The brands the AI never cites, on the other hand, often don’t have an entry at all. Or they have one but it’s empty, with three lines and no linked properties.
This is no coincidence. Wikidata is one of the structured sources that feed the Knowledge Graphs of Google, Bing and several modern AI systems. If your entry isn’t there, or is skeletal, you’re playing the AI visibility game without having filled in the registry form.
Let me explain what Wikidata really is for an AI model, why it sits upstream of almost everything I’ve described in this series, and how to create an entry that works.
What Wikidata is for an AI model
In the field of research on the relationship between language models and knowledge graphs, Wikidata holds a specific position. The survey by Cedric Möller et al. (2021) on entity linking over Wikidata describes it as a continuously updated, community-maintained, multilingual knowledge graph. The work by Wu et al. (2023) on the integration between LLMs and knowledge graphs places it among the encyclopedic graphs most used as a source of external knowledge for AI systems.
Be careful though, Wikidata is not Wikipedia: it isn’t a narrative encyclopedia, it’s a structured database. Every entity (you, your brand, a product, a person) has an identifier code (the Q-number) and a series of properties linked to verifiable values. Type of activity, website, headquarters, founder, founding year, industry, awards received, publications.
Translated into practice, when an AI system has to give an answer about a brand, a company, a professional, one of the routes it can take is linking to a known entity in a graph like Wikidata. If the entity isn’t there, the model relies only on what it read scattered across the text during training. Less reliable, less citable, less likely to appear in answers.
Why Wikidata sits upstream of everything else
In the previous articles in this series I explained how AI represents concepts in the vector space of embeddings and how it recognizes an author as an entity through author entity recognition. Wikidata is the layer beneath.
It’s the register where your brand stops being “a string of text that shows up here and there” and becomes an entity with a stable identity. With a code, with verifiable properties, with external links to authoritative sites, with multilingual translations.
The reason why all the work you do downstream (schema markup, E-E-A-T, well-structured content) pays off more when you have a completed Wikidata entry is simple: you give the AI system a disambiguated anchor point. “This brand here” instead of “maybe this brand, maybe another one with a similar name”.
The conclusion for you is straightforward: if your entity is well represented in that kind of graph, you’re usable material for the answer. If it isn’t, you’re noise the model prefers not to cite in order not to get it wrong.
If your entry isn’t there, or is skeletal, you’re playing the AI visibility game without having filled in the registry form.
The test you can run in ten minutes
Go to wikidata.org and search the exact name of your brand. Three possible outcomes:
- No result: you don’t exist in the graph. Zero semantic anchoring for the AI.
- Result with a bare entry: you’re there, but you only have a name and one or two properties. Little to cite.
- Result with a rich entry: you have P31 (instance of), P856 (website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (industry) filled in. You’re citable.
The binary threshold to start from is simple: if you search for your brand and find nothing, or find an entry with fewer than 5 properties filled in, you’re below the minimum threshold. It isn’t the real analysis, which requires professional tools and control of linguistic aliases, but it’s an honest first step.
Then run the counter-check: open ChatGPT or Perplexity and ask “what do you know about [your brand name]?”. If the answer is generic, confused or makes things up, that’s a consistent signal. If it accurately names founding year, headquarters, industry, there’s a good anchor upstream.
Fill in at least these six properties: P31 (specific type, not generic), P856 (official website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (vertical industry).
A case I followed: a coffee roaster in Parma
An artisan coffee roaster in Parma who works with me had a situation typical of the food specialty scene of gastronomic Emilia: a well-made site, recognized product quality, presence in coffee industry magazines, but zero visibility in AI answers to queries like “artisan coffee roasters Emilia-Romagna” or “specialty coffee micro-roasters Italy”. Perplexity always cited the same three or four names, he never showed up.
Check on Wikidata: entry missing. No Q-number. On Google’s Knowledge Graph, same thing, no panel.
The intervention was precise: creating the Wikidata entry with P31 (enterprise), P856 (website), P159 (headquarters in Parma), P112 (founder), P571 (founding year), P452 (industry: coffee roasting), plus external links to reviews from the specialized press and to articles in food magazines. No magic, just careful compilation and verifiable external references.
After about five months, on a sample of 15 queries we test monthly on ChatGPT and Perplexity, the brand went from zero citations to appearing in 6 answers out of 15. An indicative test, not a study: small sample, no control group, and in the meantime we also worked on other fronts (content structure, schema on the site). But the pattern is consistent with what I see with other food specialty clients: the Wikidata entry isn’t enough on its own, it’s not a magic switch, but it moves the needle noticeably when downstream you have a well-built site.
The mistakes I see most often
Among small food producers in gastronomic Emilia, and not only there, these are the recurring patterns:
- Entry created and abandoned. An entry with three properties and then no further updates. Wikidata is a living graph: if it doesn’t grow, it ages badly.
- Wrong or generic P31. Putting “business” instead of “coffee roasting”, “winery”, “artisan pasta factory”. You lose the vertical classification, which is exactly what triggers the citation on industry queries.
- No authoritative external link. An entry without references to third-party sources (press, trade associations, industry databases) is fragile. It can even be flagged for deletion by the community.
- Self-referential description. Phrases like “the best artisan coffee roaster in northern Italy” get removed. Wikidata wants neutral facts: “specialty coffee roaster founded in [year] in [city]”.
Compare your entry (or its absence) with the 3-5 competitors the AI cites most often in your industry queries: often the difference is exactly here.
What to do concretely
- Search for your brand on wikidata.org. If it doesn’t exist, create the entry.
- Fill in at least these six properties: P31 (specific type, not generic), P856 (official website), P159 (headquarters), P112 (founder), P571 (founding date), P452 (vertical industry).
- Add external identifiers where you have them: any trade registries, authoritative profiles.
- Insert at least 2-3 references to third-party sources (industry press, associations, recognized catalogs).
- After 2-3 months, retest the industry queries on ChatGPT and Perplexity and compare with the baseline.
The real analysis, with complete alias mapping, monitoring of missing properties compared to competitors and integration with schema markup on the site, requires professional tools and a more expert hand. This is the first step to start from.