Measuring AI visibility

AI Confidence Indicator: read how much the AI trusts you from the language it uses

The AI cites you but uses phrases like 'it could be among the options' or 'some consider it' — and that conditional, for the customer reading it, carries as much weight as uncertainty about you. It's not a matter of appearing or not appearing: it's a matter of how much the AI trusts what it says about you. Raising that level of trust is possible, and the difference between a lukewarm citation and an assertive one is measured in customers who call instead of moving on.

The AI names you with “it’s possible”, “some report”, “it could be an interesting choice”. That’s a low-confidence indicator — and measuring it tells you exactly what to fix to move up to “it’s among the reference manufacturers in the district”.

Let me explain why this is one of the most underrated KPIs when it comes to visibility in AI answers. In my articles on share of voice and citation accuracy I covered “how often you appear” and “they name you correctly”. The confidence indicator answers a different and often more commercial question: in what tone they name you. Because a customer who reads “brand X is a reference in contemporary furniture from Lissone” clicks. One who reads “brand X could be, among others, a possible option” closes the chat and asks their brother-in-law.

What the confidence indicator actually measures

When an AI model generates an answer, it chooses words that reflect the solidity of the information it has about the sources. It’s not a number shown in the interface: it’s a linguistic signal you can read in plain text within the answer.

In the world of research on attributed retrieval with LLMs, the framework proposed by Hanane Djeddal et al. (2024) in the Evaluation Framework for Attributed Information Retrieval using LLMs cleanly separates two dimensions that are usually conflated: the correctness of what the AI states and the support the sources give to that statement. Two answers can both be correct, but one is backed by strong, consistent sources, the other is correct “by chance” because the AI had to extrapolate from weak signals.

From this follows something very practical for your brand. When source support is high, the AI uses assertive language: “is”, “makes”, “its headquarters is in Lissone”. When support is low and the AI is partly guessing, the model covers itself by inserting linguistic hedges: “it would seem”, “it is said that”, “some consider it”. Translated into practice: the language you see in the answer is the readable proxy for the level of evidence the AI has gathered about you.

The operational consequence is that you can measure confidence without accessing the model’s internal probabilities. You just need to read the answers about you carefully and classify them.

Three confidence levels: the grid I use with clients

To make the concept usable, I reduce the continuum to three levels. Not continuous metrics: decision thresholds.

Assertive. The AI states facts in the present tense, with no conditionals. Examples: “Brand X is a manufacturer of contemporary furniture headquartered in Lissone, active since 1978, specialized in modular bookcases”. Zero hedging. Strong, consistent, multiple sources.

Moderate. The AI states but inserts 1-2 hedges on peripheral details. Example: “Brand X makes contemporary furniture in Lissone, and would seem to also work with hospitality contract projects”. The backbone is solid, the marginal details are shaky.

Uncertain. The AI uses conditionals on central claims, cites vaguely, adds disclaimers like “but I recommend you verify”. Example: “Brand X could be a company from the Brianza district that some cite among contemporary furniture manufacturers”. Here source support is weak.

The job isn’t just to classify the last answer you read: it’s to track the progression over time. If in January the AI names you in uncertain language and by June you move to moderate, you’re winning. If you stay stuck at uncertain for six months, you have a signal problem you won’t solve with more blog posts.

Common mistake

If the AI says “it would seem that Brand X is in Lissone”, the problem isn’t that your homepage doesn’t say “we’re in Lissone”.

Why this KPI sits upstream of conversion

The confidence level isn’t a stylistic quirk: it impacts the decision. A user who reads an assertive AI answer receives an implicit “endorsement”. A user who reads multiple hedges perceives the brand as a backup option. The same AI, in the next answer if the user asks “give me your top 3”, will tend not to include the brand mentioned with hedges — because it doesn’t trust it enough to put it at the top.

This is why the confidence indicator speaks directly to the themes I covered in this series. If you’ve worked on the weight of implicit references and on recognition of the author as an entity, what you’ll see on the confidence indicator is their linguistic projection. More consistent authority signals → less hedging in the way the AI describes you.

Pro tip

Set up a monthly scorecard with the three zones (assertive, moderate, uncertain) and the percentage of answers per zone.

The test you too can run in 30 minutes

Grab pen and paper — seriously, no Excel sheet on the first iteration.

  1. Pick 10 brand queries (“who is Brand X”, “Brand X reviews”, “Brand X makes furniture”) and 10 generic industry queries where you’d want to appear (“best contemporary furniture manufacturers in Brianza”, “furniture companies in Lissone for contract work”, “who makes custom modular bookcases”).
  2. Run the 20 queries on ChatGPT, Claude, Perplexity and Gemini. Save them in a document.
  3. For each answer where you appear, highlight the verbs and adverbs in three different colors. Red: hedges (“it would seem”, “could”, “some report”, “it is said”). Yellow: verbs in the conditional or with “among the other options”. Green: present-tense statements with no hedging.
  4. Count. If across 20 answers you have 3 pure greens, 7 yellows, 10 reds: you’re at the uncertain level. If you have 12 greens, 6 yellows, 2 reds: you’re at assertive. It’s a rough threshold, but it’s an honest starting point.

An indicative test, not a formal study: small sample, language interpretation. For real analysis you need professional AI monitoring tools and a more sophisticated classification. But as a zero point it works — in half an hour you have the number.

The test I ran on the Lissone furniture district

Let me tell you about an instrumental test I ran on ten contemporary furniture manufacturers from the Lissone district and surroundings — a sector I know because I’ve been working in it with two clients since the start of 2025. Revenues between 4 and 22 million, a mix of direct sales, contract and foreign distributors.

I built 60 AI answers: 6 recurring queries for each of the ten brands, spread across ChatGPT, Perplexity and Gemini. Queries like “who is Brand X furniture Lissone”, “Brand X makes custom pieces”, “Brand X price range”, “best contemporary furniture manufacturers Brianza” with the brand mentioned or not.

The result, classifying the language:

  • 2 out of 10 brands have 70%+ of their answers in assertive language. Not coincidentally, they’re the only two with a clean Wikidata entry and 6-8 indexed trade-press articles over the last 18 months.
  • 5 brands sit in moderate: they’re named in the present tense, but with hedges on the details (price range, year of founding, list of collections).
  • 3 brands are in uncertain: the AI names them with “it would seem to be”, “some cite it”, or refuses to comment on product quality.

The data point that struck me: the three brands in the uncertain zone have a website, catalog and revenue comparable to the five in the moderate zone. But their presence in secondary sources (trade magazines, furniture portals, profiles on B2B platforms) is asymmetric. Small sample, not generalizable to the whole sector, but the pattern comes back fairly clean: the AI’s confidence is proportional to the density of consistent signals, not to revenue.

The most common mistakes

Reading one answer and drawing conclusions. A single AI answer is noise. You need at least 15-20 per brand, spread across 2-3 engines, before passing a verdict on the confidence level.

Confusing the model’s hedging with hedging about the brand. ChatGPT and Claude have, by policy, a more cautious style than Perplexity. If you compare linguistic hedges across engines without normalizing, you’ll convince yourself you have a problem that isn’t there. Always compare the same brand with the 3-5 competitors the AI cites in your sector, on the same engine, in the same query.

Fixing the language by working on the site. If the AI says “it would seem that Brand X is in Lissone”, the problem isn’t that your homepage doesn’t say “we’re in Lissone”. It’s that third-party sources don’t say it enough. The work has to be done outside your site: entity records, industry citations, structured presences on authoritative directories.

Obsessing over the single adjective. Whether the AI says “contemporary” or “modern” is secondary. Whether it uses the present tense or the conditional is the signal that counts.

What can you actually do?

  • Set up a monthly scorecard with the three zones (assertive, moderate, uncertain) and the percentage of answers per zone.
  • Focus the external work on the details where you see hedging today: if the AI is uncertain about the year of founding, work on Wikidata and a clean entity record; if it’s uncertain about positioning, work on industry citations.
  • Monitor the migration between zones every 90 days. The realistic goal isn’t to move from uncertain to assertive in three months: it’s to shift 2-3 answers from red to yellow, and 2-3 from yellow to green.
  • Compare yourself with the competitors the AI cites in assertive terms: what do they have in the search results that you don’t? Nine times out of ten it’s a combination of trade press, entity records and presences on authoritative portals.

The thread to hold on to

Showing up in AI answers isn’t just about appearing: it’s about how you appear. The confidence indicator gives you a readable thermometer of how much the AI trusts you, month after month. It’s one of the few KPIs where you can observe with the naked eye the signal the engines read — and adjust your aim before you lose more months of visibility.

In the next articles in the series I take you forward on how to build a monthly visibility scorecard that integrates the confidence indicator with share of voice and citation accuracy, and on how to read the right tracking tool if you want to automate the classification of language.

Chapter 7 · Measuring AI visibility

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

7.1 Competitive Benchmarking 8 deep dives
7.2 KPIs & Metrics 8 deep dives
7.3 Reporting & Dashboard 8 deep dives
7.4 ROI & Business Impact 8 deep dives
7.5 Tools 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand