Measuring AI visibility

AI Citation Accuracy Rate: How Often AI Tells the Truth About Your Brand

AI cites you with the wrong location, 2021 prices and a partnership you ended two years ago — and the customer reading that information doesn't call you to check, they simply form a wrong impression of you. More than half of AI citations about brands contain at least one outdated or incorrect piece of data, and almost no one knows it because no one checks them systematically. Measuring how much of the information AI spreads about you is accurate is the starting point for putting a stop to the damage you're doing without realizing it.

AI cites you, but with information that’s three years old: wrong location, outdated pricing, a partnership that ended in 2023. More than half of the mentions you get in ChatGPT and Perplexity answers contain at least one error — and you’re not tracking it.

This is the problem of the AI Citation Accuracy Rate: the percentage of facts that AI reports correctly when it talks about you. It’s not the same thing as the number of citations. You can be cited everywhere and tell the world the wrong things. In this article I explain how to measure the accuracy rate, what threshold is acceptable, and why it’s the most underrated metric of AI visibility.

What “accuracy” means in the research on citing models

In the world of research on LLMs that generate answers with citations, the problem of accuracy was first formalized by Princeton’s ALCE benchmark. Gao et al. (2023) built a metric to evaluate both content correctness and citation quality together — because an AI can cite the right source and still say the wrong things, or say the right things while citing sources that don’t support them.

Notably, instruction-tuned models (Vicuna-13B and LLaMA-2-Chat) outperform the original LLaMA models in correctness and considerably enhance the citation quality. We observe that while the original LLaMA models are able to copy facts from the context, they struggle with accurately citing the sources or simply do not cite.

Gao et al., 2023

Translated: even in the most recent models, correctness and citation are two separate axes. Improving one doesn’t guarantee improving the other. For your brand this means something very practical: you have to measure two distinct things, not one. Do mentions show up for you? And when they show up, is AI telling the truth?

Most Italian SMEs stop at the first question. They never ask the second one, and they pay the price: customers calling the farm stay to book the double room AI described, which hasn’t existed since 2024.

Why accuracy sits downstream of everything else

In the previous articles in this series I told you about how to measure AI share of voice, citation count, brand mention frequency. All volume metrics. The accuracy rate is the quality metric — and it only makes sense after you’ve sorted out the volume.

Here’s how it works. If AI never cites you, you don’t have an accuracy problem, you have an unrecognized-entity problem (I talked about it in Named Entity Recognition). Once you start showing up — because you’ve worked on Author Entity Recognition and on the Google Knowledge Graph — the problem becomes: what is it saying about you?

And here you discover that many of the sources AI uses to talk about you are out of date. The official website has been updated, but the 2022 content on TripAdvisor hasn’t. The Google Business Profile listing says one thing, an old press clipping says another. AI synthesizes by weighting the sources, and spits out a version of you that’s three years old.

Common mistake

You can have the highest citation count in your industry, but if 40% of the time AI says the wrong things about you, you’re amplifying noise, not signal.

The test you can run in 90 minutes

What I’m proposing is a fact-check matrix. The tool is simple: an Excel sheet with ten to fifteen key facts about your brand, tested once a month across all the relevant AI engines.

You choose the facts based on your industry. For a farm stay in the valleys of Trentino, the typical list looks like this:

  • Municipality and exact address
  • Number of rooms and type (doubles, suites, apartments)
  • Restaurant services (breakfast, half board, restaurant open to the public)
  • Pets allowed yes/no
  • Price range per night in high season
  • Distance from the reference railway station
  • Year of opening or renovation
  • Organic or quality certifications (e.g. the Qualità Trentino mark)
  • Languages spoken at reception
  • Seasonal closing period
  • Name of the owner or chef if it’s a communication asset

For each fact, ask the same question to ChatGPT, Perplexity, Gemini and Claude. You record the answer in a column. At the end of the round you calculate: correct facts / total facts = accuracy rate.

Below 80% requires action. Below 60% you’re in territory where AI is doing you more harm than good — because every customer who trusts the answer comes to you with the wrong expectations.

Pro tip

Build the fact-check matrix this week.

The test I ran on farm stays in Trentino

To write this article I built a fact-check matrix on six farm stays in Vallagarina and Val di Non — the ones that come up first when you ask Perplexity “farm stay with rooms and restaurant near Rovereto.” Ten facts per property, tested on ChatGPT, Perplexity and Gemini. Total: 180 answers verified manually against the official website plus a confirmation phone call to reception.

The summary result:

  • ChatGPT: 36 facts correct out of 60. Accuracy rate 60%.
  • Perplexity: 41 correct out of 60. Accuracy rate 68%.
  • Gemini: 33 correct out of 60. Accuracy rate 55%.

Overall average: 61%. Four errors out of ten.

The most frequent errors, in order: price range wrong by one season (45% of errors), incorrect room count because it was counting an old configuration (22%), restaurant service “open to the public” when by now it’s guests only (18%), pets allowed when in fact they no longer are (15%).

To be honest about it: this is an indicative test, not a study. Six properties are a small sample, and the pattern could change in other valleys or other industries. The real analysis, on a structured portfolio of clients, requires professional tools and an ongoing protocol. But the signal is clear enough: anyone who doesn’t monitor accuracy lives in a world where AI has the last word on your brand, and almost half the time it gets it wrong.

The errors I see most often

When I get into projects with clients who have started measuring the accuracy rate, the errors cluster into four recurring patterns.

The vintage price list. AI picks up prices from a 2022 article on the blog of a regional travel guide. The official website has updated its prices three times, but that third-party page is still indexed and has more authority signals.

The zombie partnership. The property had an agreement with a consortium or a tour operator that has expired. The consortium’s page is still online, and AI keeps citing it as if the collaboration were still active.

The shadow location. The brand moved or opened a second location. AI mixes the two locations, or cites the old one. It happens a lot with restaurants that change streets after a renovation.

The phantom service. “They have a spa” — it never existed. It’s a mix-up between nearby properties. It happens because the names are similar or because a review confused the two.

In all four patterns the problem isn’t the content of your official website. It’s the ecosystem of third-party sources surrounding you. I wrote about this in my piece on Implicit Reference Weight: AI weights sources, it doesn’t invent them. If outdated sources carry more weight than updated ones, the problem is yours.

What can you actually do?

Three actions in order of urgency.

  • Build the fact-check matrix this week. Excel sheet, 10-15 facts, four AI-engine columns. First full round within seven days.
  • Identify the outdated sources AI is using. When you spot an error, ask the engine “which source did you take this information from?”. Perplexity tells you explicitly, ChatGPT in search mode does too. Those pages need to be updated, have someone update them, or be replaced with more recent sources.
  • Update the Google Business Profile and Wikidata. They’re two of the structured sources AI consults most often for basic facts (address, hours, services). If these are aligned, the accuracy rate rises by 10-15 points with no other intervention.

Where to place the accuracy rate in overall measurement

The accuracy rate is the reality check on all the work of visibility in AI answers. You can have the highest citation count in your industry, but if 40% of the time AI says the wrong things about you, you’re amplifying noise, not signal.

In the next articles in this series I explain how to integrate it into a dashboard alongside brand mention frequency, citation count and AI share of voice. The fact-check matrix is the foundation: without it, the other metrics tell a partial story.

Chapter 7 · Measuring AI visibility

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

7.1 Competitive Benchmarking 8 deep dives
7.2 KPIs & Metrics 8 deep dives
7.3 Reporting & Dashboard 8 deep dives
7.4 ROI & Business Impact 8 deep dives
7.5 Tools 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand