Gemini doesn't choose who to cite by reading websites at the moment of the question: it draws on a map of brands and companies that Google has already classified and recognized. Across 47 tested queries, more than 80% of the time it cites only entities that Google already knows within its internal structure. If you're not on that map, you're not considered — period. Becoming an entity recognized by Google is precise work that pays off across every AI channel simultaneously.
Gemini answers with high confidence when the brand entity is in Google’s Knowledge Graph. Across 47 tested queries, the correlation is above 80%.
Let me say it right away to bring the point of this article into focus: when Gemini or Google’s AI Overviews have to extract a snippet to answer a user, they don’t pick at random from the web. They start from what Google already knows as a structured entity — a brand, a place, a product already recognized in the knowledge graph — and from there they select the pieces of text to show in the answer.
If your brand isn’t inside that graph, the AI doesn’t ignore you out of spite. It simply doesn’t see you as an entity on which to build an answer. And this explains why two producers with identical products, in the same district, receive completely different visibility in AI answers.
The mechanism: Google selects chunks starting from the graph
In the world of research on LLM grounding, the Knowledge Graph is described as the infrastructure that holds verified facts together and makes them queryable by a language model.
Knowledge Graphs (KGs) A KG is a heterogeneous directed graph that contains factual knowledge to model structured information.
Translated: a knowledge graph is a network of nodes (entities) connected by typed relationships, used to store facts in structured form. It’s not a database of texts, it’s a database of things that exist and of how they relate to one another.
The operational consequence is direct. When a user asks Gemini “best producers of bergamot liqueur from Reggio Calabria”, the model doesn’t just run a text search. First it queries the graph to figure out which “bergamot producer” entities exist as nodes, then it goes to pull text snippets from the sites associated with those nodes. If your site tells wonderful things but isn’t anchored to any entity in the graph, you’re out of the shortlist before snippet selection even starts.
Why the graph comes before the snippet
In this series I’ve already explained how the Google Knowledge Graph entry works and how entity recognition happens. Here I add one piece: the AI’s reasoning over answers is a step-by-step process over the graph, not a simple selection of paragraphs.
Stepwise reasoning over KGs offers a natural mechanism to track, guide, and interpret the reasoning process.
In short: stepwise reasoning over graphs offers a natural mechanism to track, guide, and interpret the reasoning process. The AI doesn’t improvise — it walks the graph from node to node, and at each step it decides which entity to carry along to build the answer.
For your business it means one simple thing: if your brand is a node in the graph, you get “carried along” at every step the AI takes through the bergamot district, for example. If you’re not, you grab the crumb that falls when the AI desperately looks for textual sources to back it up, but you’re never in the core of the answer.
The company is called “Antica Distilleria Rossi” on the website, “Rossi 1952” on Instagram, “Distilleria F.lli Rossi srl” on the business registry filing and “Rossi Bergamotto” on Google Maps.
The test you can run on your brand in 15 minutes
Before I bring you the test I ran on bergamot producers, here’s how to replicate it on your brand — whatever your sector — in 15 minutes.
Step 1: check whether you’re an entity in the graph. Open Wikidata and search for the exact name of your brand. If you show up with a Q-number page (e.g. Q123456) and you have at least 4-5 statements (location, sector, founder, website), you’re in. If you don’t show up or you only have the bare name, you’re out.
Step 2: check the schema markup on the homepage. Go to Google’s Rich Results Test, paste your homepage URL, click “Test URL”. Look in the response for the presence of “Organization” or “LocalBusiness”. If even that isn’t there, the graph has no structured handholds to connect you.
Step 3: the direct test on Gemini. Open Gemini or Google’s AI Overview and run 5 queries about your sector written the way a customer would write them (“best X producers in area Y”, “who makes artisanal Z”, “differences between producer A and B”). Count how many answers cite you by name. Binary threshold: 0 citations out of 5 queries = you’re outside the operational graph. 1-2 = you’re getting in. 3+ = you’re in.
This check is entry level — the real analysis requires professional tools and continuous monitoring — but in 15 minutes it tells you which side of the wall you’re on.
A single official naming everywhere (website, social, Google Business Profile, Wikidata, business registry filing if possible).
The test I ran: 15 bergamot producers in Reggio Calabria
Bergamot from Reggio Calabria is an interesting case because the district is small (essentially the Ionian strip from Villa San Giovanni to Gioiosa Jonica), the producers are identifiable, and AI demand is fairly mature: Italian and foreign customers ask Gemini “where to buy essential bergamot”, “artisanal bergamot liqueur Calabria”, “bergamot producers in Reggio”.
I took 15 brands of bergamot producers and derived liqueurs — some with forty years of history, some new, some consortia, some wineries/distilleries. For each I verified two things: presence as an entity in the Knowledge Graph (a Wikidata page with at least 4 statements + presence in Google Search’s knowledge panel when searching the brand name) and Organization/LocalBusiness schema on the homepage. Then I ran 47 queries on Gemini — variants of “who makes bergamot in Reggio”, “artisanal bergamot liqueur”, “bergamot essential oil Calabria best producer” — and for each query I counted which of the 15 brands were cited by name.
The result: of the 8 producers present in the Knowledge Graph, 7 appeared in at least 40% of the queries. Of the 7 producers outside the graph, only 1 appeared sporadically (in 3 queries out of 47). The correlation between “KG presence” and “Gemini citation” exceeds 80%.
Stated limits: indicative test, not a study. The sample of 15 is small, the queries are mine (not a statistical panel), and Gemini changes its answer from one session to the next. But the pattern is so clear that it explains well why two producers with similar products receive opposite AI visibility. Those who have Wikidata + schema + journalistic mentions are an entity. Those who only have a pretty website are loose text.
The mistakes I see most often
When I work with artisanal food companies, I see four patterns recurring that keep even brands with 30 years of history out of the graph.
Beautiful website, zero schema. The homepage has photos of bergamot groves at sunset, the grandfather’s story, the values manifesto. In the code there isn’t a single line of JSON-LD telling Google “I’m a farm, I’m based here, I produce this”. The Rich Results Test returns an error or an empty page.
Nonexistent or stub Wikidata. Many producers have a Wikipedia page (sometimes) but no one has ever created or completed the Wikidata entry. Result: the brand is in Wikipedia’s text, but it’s not there as a graph node with structured properties.
Inconsistent brand name. The company is called “Antica Distilleria Rossi” on the website, “Rossi 1952” on Instagram, “Distilleria F.lli Rossi srl” on the business registry filing and “Rossi Bergamotto” on Google Maps. To the AI these are four different entities, all weak.
Mentions only on its own website. The brand is cited only on pages it controls (website, social, eshop). Zero implicit mentions on Gambero Rosso, Slow Food, regional guides, local newspapers. Without those mentions, the implicit reference weight stays at zero and the graph has no reason to give you weight.
What to do concretely
If you find yourself in at least two of the four mistakes above, the work to do follows a precise order. Don’t start from content — start from the anchoring.
- Reconcile the brand name. A single official naming everywhere (website, social, Google Business Profile, Wikidata, business registry filing if possible). The AI needs a single canonical string.
- Organization schema on the homepage + LocalBusiness on the contact page. With sameAs pointing to Wikidata, Google Business Profile, LinkedIn, official Instagram.
- Create or complete the Wikidata entry. Minimum: instance of (e.g. farm), location, founding year, main products, official website, references to 2-3 journalistic sources.
- Work on implicit mentions. Invite sector journalists (food, food-and-wine) to try the product. A mention on an authoritative outlet is worth more than 10 boosted posts.
- Compare with the 3-5 competitors Gemini cites when you search your sector. Look at what they have that you don’t: Wikidata? Schema? How many outlets name them? It’s your closable gap.
The snippet that Gemini shows in the answer is the last link in a chain. If you want to be chosen, you have to exist in the first links: entity in the graph, schema on the site, credible mentions outside your own perimeter. Otherwise you’re asking a model to cite you without having given it a structured reason to do so.
Where does all this lead?
Visibility in AI answers, I’ll say it again, doesn’t come from the perfect paragraph — it comes from being a recognizable node in the graph on which the AI builds the answer. Snippet selection comes after. If you skip the first step, the second doesn’t even kick in.
In the next articles of this series we’ll see how source selection works in Perplexity (a model very different from Gemini, with enormous weight on fresh citations), the behavior of ChatGPT with integrated web search, and the specific strategies for entering the graph when you start from scratch. In the meantime, the point to start from is one: understanding whether today you’re an entity or you’re loose text.