There is a free Google tool that generates a detailed report on who gets cited in your industry and from which sources — and it's the same engine that large companies use internally for strategic research. If you run it on your market, you see in plain sight why your competitors show up in AI answers and you don't. It's not a theoretical analysis: it's the exact map of where to move your resources to stop being invisible.
Run Gemini Deep Research on your niche industry. The report it generates is a gift: it tells you exactly who gets cited and from where. Read it to understand where you are and where you aren’t.
I’m telling you this right up front because it’s the most underrated part of this whole job. Most business owners try ChatGPT, see that their company doesn’t get cited, and get discouraged. Wrong. ChatGPT is a black box. Gemini Deep Research, on the other hand, hands you a 15-page PDF with the exact list of the sources it consulted, ordered, with URLs. It’s reverse engineering served on a plate.
In this series I’m walking you through how the different AI platforms work. Today I’ll explain the Google case: what changes when the engine that’s supposed to cite you isn’t generic ChatGPT but an infrastructure like Vertex AI or Enterprise Search, and why Deep Research is the most honest tool you have for understanding your starting position.
What Vertex AI Is When a Company Uses It to Search
Vertex AI is the platform that Google sells to companies for building AI applications. A bank, a retail chain, a large law firm can configure it in two ways: pull only from their internal documents, or also pull from the public web. When they enable the second mode, the system does what Gemini does when it answers you: it searches, retrieves, synthesizes, cites.
Enterprise Search (the “internal search for employees” version) follows the same logic. Important difference: the retrieval model that decides which pages get into the context is the same stack that powers Google’s generative answers in consumer products. Same chunking criteria, same relevance parameters, same source-selection logic.
Translation: if your page gets pulled well by Gemini when a user runs a web query, it also gets pulled well by the RAG system that an enterprise company has configured on top of Vertex AI. The distribution channel changes, the selection mechanism doesn’t.
Why Deep Research Is the Only Honest Way to Reverse Engineer
In previous articles I explained how tokenization works and how AI evaluates E-E-A-T to establish who is authoritative. All nice in theory. In practice, when you need to understand where you actually rank in an industry, you need a concrete method.
Deep Research is that method. When you run it on a vertical query, the system performs 30-40 secondary searches, consults dozens of sources, and produces a report in which every statement is linked to the original source. You don’t have to guess what it cites: you’ve got it written down.
It follows that you have two pieces of information in hand: who Google cites on that topic, and what editorial characteristics the chosen sources have. It’s a map.
Building “showcase” pages with no text: three slogans and a form.
The Test I Ran on Three Italian Niches
I chose three verticals small and specific enough to be interesting: Italian haute horlogerie watchmaking (craftsmen who make fine watches by hand, not multi-brand boutiques), certified organic farm stays in the Langhe, mural fresco restoration studios.
For each one I ran Gemini Deep Research with similar prompts: “make a report on the main independent Italian watchmaking craftsmen who produce haute horlogerie“, and so on for the other two. I downloaded the reports, opened the sources section, counted.
Results on this sample (indicative test, not a study): across the three niches the report consulted an average of 32 distinct sources. The sources cited most often (3+ references in the report) totaled 11 across the three verticals. Of these 11, 9 had some characteristics in common that I’ll list for you shortly. The other two were Wikipedia and a trade magazine with a very old domain.
Honest limit: three niches don’t make a statistic. But the pattern is clear enough to be usable as an operational indication.
Open Gemini, enable Deep Research, run a prompt on your niche industry (“make a report on the main [type of company] in [territory]”)
The Pattern of the Sources Gemini Chose to Cite
The 9 “core” sources shared these characteristics:
- Long, self-contained pages: each H2 section explained a concept on its own, without referring out to other pages on the site
- Rich metadata: a descriptive title, a written meta description, schema markup of type Article or Organization
- No paywall or soft-gate (no mailing-list popup blocking the read)
- Fast loading: I spot-checked, the main pages stayed under 2 seconds
- Domain age: no source had been published less than 18 months ago
For haute horlogerie watchmaking, the report repeatedly cited a small atelier based between Verbania and the VCO (a craftsman who makes movements by hand for collectors). The site was spartan, no premium-agency design, but the site’s sections were organized like chapters: “How a movement is born”, “The materials of the balance wheel”, “Working times for a perpetual calendar”. Each chapter stands on its own. And it was cited 5 times in the report.
Compare with a competitor in the same niche, presumably larger in revenue, with a modern agency-built site: zero mentions in the report. The site was built to impress a human visitor on the homepage, not to be retrieved piece by piece by a RAG system.
How This Translates for Your Visibility in AI Answers
Google’s retrieval system cuts pages into chunks and indexes the chunks, not the whole pages. If your page is written discursively, with sections that refer to one another (“as we saw above…”, “we’ll talk about it in the next chapter”), each chunk taken in isolation comes out impoverished. The system discards it in favor of more “complete” chunks.
The operational consequence is trivial but devastating: a site that’s well made for humans can be terrible for retrieval. A site that looks modest but has self-contained sections can win.
Let me remind you here of the thread running through this whole series: showing up in AI answers is not a matter of classic SEO or branding. It’s a matter of technical compatibility with the way RAG systems select and cite sources. Vertex AI is just a particular case of this principle.
The Mistakes I See Most Often
- Banning H2 subheadings in the name of “clean design”: without explicit headings the system doesn’t know where to cut the chunks. Result: random chunks, messy retrieval.
- Putting key information inside accordions or tabs: often the rendering isn’t read correctly. What the visitor sees on click, the crawler never sees.
- Building “showcase” pages with no text: three slogans and a form. The RAG system has nothing to pull from. The competitor with the boring but substance-packed page wins.
- Loading speed above 3 seconds: beyond this threshold many AI crawling systems give up and move on. Your content can be the best in the world: if it doesn’t load in time, it doesn’t exist.
How to Actually Run an Audit on 3 Niches
Here are the three actions I’d take in your place this week:
- Open Gemini, enable Deep Research, run a prompt on your niche industry (“make a report on the main [type of company] in [territory]”)
- Wait 5-10 minutes, download the report, open the cited sources section
- Go to two or three sources cited more than once and compare their editorial structure with yours: headings, section length, presence of metadata. Open the Rich Results Test, paste in your homepage, see whether it finds Organization schema (binary: it’s there or it isn’t)
For a more structured check of what Google knows about you as an entity, you can reread the series piece on the Google Knowledge Graph entry and on how author recognition works for AI systems.
Honesty: this is an entry-level audit. It will confirm or refute the hypothesis “my site is retrievable”. The real analysis, with larger samples and simulated crawls, requires professional tools and weeks of work.
Your Next Move in the Platforms Series
Now that you understand how Vertex AI and Enterprise Search pull from the web, in the next articles in the series I’ll cover two closely related pieces: how Gemini handles grounding compared to the other AI engines, and why Perplexity and ChatGPT Search apply different source-selection logics for the same query. Knowing these differences is what lets you stop firing at random and start optimizing for the right system at the right time.
The principle stays the same: your visibility in AI answers depends on how compatible your content is with the retrieval parameters that each platform applies. Vertex AI is the proof positive that a modest but well-structured site is enough to beat a larger but editorially disorganized competitor.