Measuring AI visibility

AI platform visibility: why an aggregate average leads your investment astray

You look at the aggregate figure for your AI visibility, you see 26% and you think you're doing fine — but that number hides the fact that on ChatGPT you're at 8% while on Perplexity you're at 45%, and your customers mostly use ChatGPT. Averaging across different platforms is like averaging January revenue with July revenue: a number that tells you nothing useful. Breaking the data down by platform takes less than an hour and turns your strategy from generic into surgical.

I tracked an Italian B2B brand across five AI engines for four consecutive months. The citation numbers, at the end of the period, were these: ChatGPT 45%, Perplexity 28%, Gemini 12%, Claude 8%, Copilot 7%. Averages that shift every month, sometimes by as much as 10 points. If you stop at the aggregate number — “the brand shows up in 20% of AI queries” — you’re looking at a blurry snapshot that hides the very information you need to decide your budget.

Let me explain why per-platform visibility is the metric that sits upstream of every serious investment in GEO, and how to measure it without thousand-euro-a-month tools.

What it means to be visible “per platform”

Every AI engine is a different system. ChatGPT uses a set of sources and a way of synthesizing them that isn’t the same as Perplexity’s, which in turn isn’t Gemini’s. The crawlers differ, the weight given to explicit citations differs, the underlying indexes differ (some pull from Bing, others from Google, others from a mix).

In the world of research on retrieval and grounding systems, the principle is by now solidly documented: two retrievers trained on different objectives, or two models with different system prompts, return source sets with only partial overlap, often below 50%. From this it follows that your visibility is not a single quantity. It’s five quantities, one per platform, and they must be measured separately.

Translated into practice: you can be cited in 70% of the relevant queries on Perplexity and 10% on ChatGPT. The average says 40%. That 40% doesn’t exist in any operational reality — it’s a number that doesn’t tell you where to act.

Why it sits upstream of every other metric

In my previous articles I told you about metrics like AI Share of Voice and Query Coverage Rate. They’re useful KPIs, but if you calculate them on an aggregate pool of AI engines you lose the most important signal: the variance between platforms.

Variance is the figure that tells you where your strategy works and where it doesn’t. If you appear well on Perplexity but poorly on ChatGPT, you’re probably strong on backlinks from citation-worthy sites but weak on author entity recognition — ChatGPT weights certain signals more heavily, Perplexity others.

Without this breakdown, you spend your GEO budget at random.

Common mistake

The classic one: “I monitor myself on Perplexity because that’s the one I use.”

The test you can run in 45 minutes

You need three things: a spreadsheet, a list of 10-15 queries that one of your potential customers would type, and 45 minutes of your time.

Take the queries one by one and run them on all five platforms: ChatGPT, Perplexity, Gemini, Claude, Copilot. For each platform, note in one row:

  • Cited by name? yes/no
  • Cited with an active link? yes/no
  • Position in the answer: high / medium / low / not present

At the end you have a 15-query × 5-platform matrix. Count the percentages by column. That’s your real baseline.

Binary reading threshold: if the difference between the best platform and the worst exceeds 20 percentage points, you have a consistency problem. It means one channel is pulling the cart and the others aren’t — and the day that channel changes its algorithm, you lose visibility all at once.

To track queries over time you may find Google Search Console useful (to see whether Gemini/AI Overview is pulling you in) and, for an external check of the brand signals that AI engines use, Google Trends. The real analysis, done consistently, requires professional tools dedicated to AI monitoring — these tests are an honest first step, not a substitute.

Pro tip

Set your GEO budget based on your weakest platform, not the average.

The test I ran: four months, one brand, five platforms

I opened the article with the numbers; now let me explain the method so you can judge them.

The brand is a manufacturer from the Mirandola biomedical district, in the province of Modena — one of those SMEs that exports medical technology around half the world but that, on ChatGPT, barely showed up at the start of tracking. I built a set of 22 queries consistent with their typical customer (hospital purchasing managers, European distributors, biomedical engineers). I ran the same 22 queries every four weeks for four months, across the five platforms, always on the same day and time slot, in non-logged-in sessions.

The result, at the end of the period, is what you read above: ChatGPT 45%, Perplexity 28%, Gemini 12%, Claude 8%, Copilot 7%. But the interesting figure isn’t the final number — it’s the month-on-month variation. Claude went from 2% to 14% and then dropped to 8%. Gemini stayed flat. Perplexity swung between 22% and 34%.

Stated limits: a single brand, 22 queries, four months. It’s an indicative test, not a study. But the pattern of “large variance between platforms, large variance over time on the same platform” I’ve seen replicated with other clients in very different sectors (automotive component manufacturers in the Brescia area, wineries in Montefalco in Umbria, boutique hotels in Ogliastra). It’s not an isolated case.

The most common mistakes

When a company starts measuring AI visibility, the wrong patterns are always the same.

Using a single platform as a proxy for all of them. The classic: “I monitor myself on Perplexity because that’s the one I use.” Perplexity cites sources explicitly and is easy to read, but its real market share is a fraction of ChatGPT’s. You’re looking at the wrong platform for your audience.

Measuring once and stopping. Temporal variance on a single platform is high. A one-off test gives you a snapshot that in six weeks could be contradicted. You need a monthly cadence, at minimum.

Ignoring Copilot because “hardly anyone uses it”. In B2B, Copilot is actually far more present than you’d expect, because it’s built into Microsoft 365. If your potential customer works at a large company that uses Teams and Outlook, Copilot is the first AI they encounter.

Confusing “cited” with “cited well”. Being mentioned in the fifth paragraph of a long answer is worth far less than being in the first. A matrix without the “position” field loses half the signal.

Using queries that are too generic. “Best biomedical manufacturers” will always give you big names and very little signal for an SME. The queries that matter are the mid-to-bottom funnel ones: “suppliers of single-use heart valves Mirandola district,” “who makes ECMO catheters in Italy,” “alternatives to [leading competitor] for dialysis.” That’s where the real traffic that converts is, and that’s where the variance between platforms becomes visible and actionable.

Treating Claude as negligible. Claude has a small share but a high-value user base: technicians, lawyers, doctors, researchers looking for reliable answers. In specialized B2B it’s worth far more than the 8% that appears in the raw numbers. If you sell to professional clients, leaving it out of the matrix is a reading error, not a math one.

What can you do right now?

  • Define 15-22 queries representative of your ideal customer. Not the classic SEO keywords: real questions, long, with comparative or shortlist intent.
  • Build a query × platform × month matrix. Update it every 30 days, the same day of the month.
  • For each platform, measure three things: presence, active link, position in the answer.
  • Compare against the 3-5 competitors the AI cites in your sector. If on ChatGPT you’re at 10% and the top competitor is at 60%, you have a clear direction for where to work.
  • Set your GEO budget based on your weakest platform, not the average. If Gemini is your gap, work on the signals Gemini weights most (presence in the Knowledge Graph, organization schema, mentions in sources already indexed by Google).

Where all this connects

Visibility in AI answers is not a single metric. It’s a five-dimensional vector, and each dimension tells you a different story about your digital presence. Measuring it in aggregate is like measuring the fever of five people by taking the average: the number exists, but it doesn’t help you treat anyone.

In the following articles in this series I go into the detail of metrics specific to each platform: how to read AI Overview data in Search Console, how to build an AI tracking dashboard that breaks down by engine, and how to turn the visibility matrix into a monthly scorecard you can share with your team in two minutes.

The operational question to take away is just one: do you really know where the AI cites you, or are you looking at an average?

Chapter 7 · Measuring AI visibility

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

7.1 Competitive Benchmarking 8 deep dives
7.2 KPIs & Metrics 8 deep dives
7.3 Reporting & Dashboard 8 deep dives
7.4 ROI & Business Impact 8 deep dives
7.5 Tools 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand