Measuring AI visibility

Hallucination Tracking Report: turning the AI’s mistakes about your brand into data you can manage

Roberto Serra 25 June 2026·~8 min read

ChatGPT is telling your potential customers that you've changed industries, that you've closed, or that you offer services you've never had — and they don't call you to check, they simply move on. The false information the AI produces about your brand isn't random: it follows precise patterns that can be tracked and corrected. Those who start monitoring it systematically turn what looks like an uncontrollable problem into something manageable.

The AI says you’re based on a street you’ve never set foot on. It says you closed in 2023 when you’re open. Hallucinations are data, not random errors — they need to be tracked.

I’ll tell you right away because it’s the thing companies understand last: when ChatGPT, Perplexity or Gemini invent a fact about your brand, that fact doesn’t hang in a vacuum. The prospect deciding whether to call you reads it. And in the meantime you don’t even know that sentence exists.

For three months I tracked the hallucinations that AI engines produced about a thermal spa in Acqui Terme, in the province of Alessandria. Not to write a paper: to build the first hallucination tracking report for a real client, to put on the owners’ table alongside the share of voice numbers. What I’m telling you here is the method that came out of it.

What an AI engine means when it “hallucinates” about your brand

AI hallucinations aren’t bugs in the classic sense. They’re the normal behavior of a generative system when the training data is incomplete, contradictory or outdated. The model produces a confident answer anyway — because it’s designed to do so — and that answer can contain wrong addresses, invented founding years, services you don’t offer, prices you’ve never charged.

In the world of research on the grounding of AI answers the principle is clear: when the model doesn’t retrieve enough verified information from retrieval, it fills the gaps with plausible statistical patterns. Plausible, not true. It’s the same logic I described to you when we talked about tokenization and how models build answers word by word: if the authoritative signal about your brand is missing, the model invents a plausible one.

From this follows an operational consequence that few want to accept: every hallucination concerning your company is proof that your entity signal is too weak to withstand the model’s generative pressure. It’s not the fault of “the bad AI”. It’s a symptom.

Why you can no longer afford not to track them

When a customer reads that your thermal spa “closed in 2023”, they don’t call you to ask for confirmation. They go to the competitor the AI cited right afterward, accurately. The damage is silent, asymmetric and cumulative: you never see it, they experience it once and that’s it.

For a company, the problem isn’t eliminating hallucinations — it’s not possible, at least not today. The problem is knowing them, sorting them by severity, and acting on the worst ones before they do damage for months. Exactly like you do with the return rate in an e-commerce: you don’t bring it to zero, you keep it under control.

In the thread of visibility in AI answers this is the metric that tells you whether your brand is “well formed” in the model’s eyes. A high share of voice with a high hallucination rate is worth less than a medium share of voice with hallucinations close to zero. The volume of citations without data quality is a vanity metric.

Common mistake

Tracking only when a customer reports the hallucination.

The test I ran: 90 days of tracking on Acqui Terme

I’ll walk you through the test transparently, with the limits stated up front.

Setup: I took 30 recurring queries that a potential customer makes when looking for thermal spas in Piedmont. Examples: “best thermal spas in Piedmont open all year round”, “thermal spa with sulphur path Acqui Terme opening hours”, “thermal spa Acqui Terme prices 2026”. All real queries, reconstructed from the client’s Search Console plus three interview sessions with the front office.

Tools used: ChatGPT, Perplexity, Gemini, Copilot. Four engines, same query, same week, repeated every 30 days for three months. In total: 30 queries × 4 engines × 3 cycles = 360 answers generated and archived.

What I measured: for each answer that named the spa I added a “hallucination” column with a four-level classification:

Severe: a fact that generates immediate loss (wrong opening hours, “closed”, wrong address, non-existent services passed off as active)
Medium: a fact that creates friction (wrong founding year, old prices, inaccurate description of the therapeutic path)
Minor: an inaccurate but not operationally harmful fact (number of employees, year of a renovation)
Clean: no invented information

Raw result, to be taken as indicative not as a study: out of 187 answers that cited the spa, 42% contained at least one medium- or severe-level hallucination. The severe level alone was at 17% — that is, almost one answer in six contained a fact that, if read by a prospect, would have led them not to call or to call a competitor. The worst figure was on Gemini, the best on Perplexity (probably because it cited the official site more often and aggregators less).

Limit of the test: small sample, specific sector, provincial geography. It’s not a peer-reviewed study. It’s a thermometer: it tells you the fever is there, not exactly how many degrees.

Pro tip

Check Google Business Profile and your presence on Wikidata.

How do you build your hallucination tracking report?

What you need isn’t expensive software. You need a well-made Excel sheet and the discipline to fill it in every month.

The minimum columns of the report:

Date of the query
AI engine (ChatGPT / Perplexity / Gemini / Copilot)
Query tested
Is the brand cited? (yes/no)
Does it contain a hallucination? (yes/no)
Severity (severe / medium / minor)
Type of hallucination (hours / address / prices / services / founding date / other)
User prompt verbatim
Screenshot or copy of the answer

Every month you calculate two hard numbers: citation rate (in how many queries you appear) and hallucination rate (how many citations contain at least one error). You put these two numbers side by side in the monthly report and watch them move over time.

The goal isn’t to bring it down right away. The goal is to understand where the hallucination originates: if it’s inventing the address, an entity signal about your location is missing; if it’s inventing the services, you have a confusing or outdated services page; if it’s inventing “closed”, there’s probably an old review or a local article that left a strong imprint in the training.

A useful entry-level check to start with even before opening the sheet: your Google Business Profile must be up to date as of today (hours, address, services), and on Wikidata there must be an entry for your company with the canonical data. If one of these two things is missing, you don’t reduce hallucinations by tracking them — you reduce them by fixing the sources the model reads. They’re a first step: the real analysis requires professional tools and a consolidation of the entity identity that I’ve already explained to you in the articles on how the AI recognizes an author entity and on getting into Google’s Knowledge Graph.

The mistakes I see made more and more often

I’ll flag four of them because they’re the ones that, if you avoid them, save you three months of work.

Tracking only when a customer reports the hallucination. For every customer who writes to you angry because “ChatGPT said you were closed”, there are another twenty who simply called the competitor. The complaint signal is the tip of the iceberg, not the metric.

Tracking a single AI engine. ChatGPT can be clean while Gemini is producing horrors about your brand, and your prospects are split between the two in a way you don’t control. You have to watch at least three.

Treating all hallucinations the same way. A wrong founding year is an annoyance. An invented closing time in July for a thermal spa is lost revenue that month. Classifying by severity isn’t methodological fluff: it tells you where to start.

Thinking it’s enough to fix the website. The website is one of the sources the model reads, not the only one. If the wrong fact lives on an aggregated spa portal, on an old review on TripAdvisor or in a local article from 2019, you have to act there. Fixing the website is necessary but not sufficient.

What can you do right away?

Three concrete actions to start this week:

Open ChatGPT, Perplexity and Gemini, run 5 queries a customer of yours would make to find you, save the answers in a document. You have your zero baseline.
Create the Excel sheet with the columns I gave you above, put it on a monthly calendar, assign it to a person. Without a calendar the tracking never starts.
Check Google Business Profile and your presence on Wikidata. If one of the two is missing, you’ve found one of the causes of your hallucinations even before you start tracking them.

Where tracking takes you in the AI visibility thread

Hallucination tracking doesn’t live on its own: it’s a piece of the measurement system that lets you tell, month after month, how you’re showing up in AI answers in an honest and sustainable way. It gives you data quality, while share of voice gives you volume and position rate tells you whether you appear first or last.

In the next articles in this series I’ll explain how to integrate this tracking into the monthly AI visibility scorecard, how to compare your data with the competitive matrix and how to link the hallucination rate to the cost per AI mention, which is the real number that convinces an entrepreneur whether or not to invest in this topic.

The point isn’t to eliminate the AI’s hallucinations. The point is to stop suffering from them without knowing it.