Entities and Knowledge Graph

Entity Confidence Testing: reading the AI’s language to understand how much it trusts your brand

Roberto Serra 25 June 2026·~8 min read

When the AI answers with "it seems to be a company in X" or "it's probably based in Y", it isn't being polite — it's admitting that it doesn't trust you enough to say it with certainty. A customer reading that answer perceives uncertainty, even without understanding the mechanism. And when the model isn't sure, it tends not to recommend you. It takes just 10 minutes to understand how the AI really describes you — and there's a precise way to correct those signals of uncertainty.

When the AI mentions you but says “it seems that”, “probably”, “it might be a manufacturer of…”, it’s giving you a valuable piece of information. This isn’t linguistic courtesy. It’s the model admitting — in its own way — that it isn’t sure who you are.

I analyzed the language of 80 AI responses about Italian SMEs across different sectors to map this pattern. The result is clear-cut: companies with good visibility in AI responses are described with affirmative verbs (“is”, “manufactures”, “is based in”). Those with low trust collect a set of hedge words — “seems”, “maybe”, “probably” — that work like a diagnostic traffic light. In this article I explain how to read them, how to test them on your brand, and what to do when you find them.

What the model is doing when it says “it seems”

It looks like conversational courtesy. In reality it’s a probability estimate disguised as human language. When ChatGPT or Perplexity write “X seems to be a cosmetics company”, the model is communicating that its internal confidence in that statement is below the threshold.

In the field of research on uncertainty expression in language models, this phenomenon has been formalized.

“Hedging grounded in this notion conveys information that is faithful to the model’s own beliefs, offering a window into what the model ‘knows’ (Farquhar et al., 2024; Joo et al., 2025).”

Eikema et al., 2025

Translated: hedges — “seems”, “probably”, “might” — are not stylistic decorations. They are a window into what the model “knows” about your brand. If it uses them, it means the internal evidence is weak.

The operational consequence for you is direct: monitoring the language of AI responses about your brand isn’t an SEO whim, it’s the cheapest diagnostic of your trust score. A 30-second analysis tells you whether your entity and authority work is paying off or whether you’re pedaling in neutral.

Why this article comes after everything else

In the previous articles in this series I talked to you about how to build your entity, connect it to Wikidata, own your “about us” page, and get recognized as an author. All signals that feed the representation the model has of you.

Entity Confidence Testing is the thermometer that measures whether that work has really moved the needle. It’s the downstream test: first you do the right things (see E-E-A-T for AI and author entity recognition), then you check whether the model has absorbed the signal. The language it uses answers that question.

Common mistake

Without screenshots or copy-paste, you can’t compare how things evolve over time.

The test you can run in 10 minutes

The mechanism is simple: ask identical questions to several AI engines, then analyze the language.

Take a natural cosmetics manufacturer and herbalist workshop in Perugia as a realistic example. The test queries I have in mind are these:

“Who is [brand name]?”
“What does [brand name] manufacture?”
“Where is [brand name] based?”
“Is [brand name] specialized in natural cosmetics?”

Open ChatGPT, Claude, Perplexity and Gemini. Paste the same query into each one. Collect the answers.

Then look for three families of words:

Explicit hedge words: “seems”, “probably”, “might”, “maybe”, “it’s possible that”, “I believe”, “apparently”.
Conditionals: “should be”, “would appear to be”, “would be specialized”.
Disclaimers: “I don’t have verified information”, “the sources are limited”, “I might not be up to date”.

The decision threshold is ternary. Zero hedges across 4 queries in 4 engines: solid entity, the model trusts you. An occasional hedge: gray zone, there’s room for improvement. Systematic hedges or disclaimers: your confidence is low, and that explains why you end up cited little or poorly in generic responses.

Pro tip

For the test on your brand, adding “state your level of confidence” or “tell me how sure you are about this answer” to the query is a useful technique for surfacing the hedge that otherwise stays latent in the language.

The test I ran myself

I analyzed 80 AI responses about 20 Italian SMEs across different sectors (4 queries per brand, on 4 different engines: ChatGPT, Claude, Perplexity, Gemini). The goal was to quantify how much assertive language correlates with actual visibility in “best X in Italy” responses.

The pattern that emerged, with all the limits of an indicative test and not a formal study:

Brands described with direct verbs (“manufactures”, “is based”, “was founded in”) also appeared in comparative sector queries in 3 engines out of 4.
Brands that collected 2+ hedge words per answer appeared in the comparatives in at most 1 engine out of 4, often in none.
Brands with at least one explicit disclaimer (“I don’t have reliable information”) never appeared in “best X” queries.

The sample isn’t large, but the pattern is fairly clear. It’s not a magic factor, and it isn’t enough on its own: it’s an indicator of how the machine sees you, not a guarantee. Real analysis — with volumes, thematic clusters, time tracking — requires professional AI monitoring tools.

In the research world, the team of Eikema et al. (2025) showed that hedges are not uniform across different models, but they reflect a consistent signal of uncertainty.

“Further analyses demonstrate robustness across decoding strategies, choice of hedgers, and other forms of uncertainty expression (i.e. numerical).”

Eikema et al., 2025

Translated for the English reader: the choice of the single hedge (“might” vs “seems”) isn’t reliable taken on its own, but the presence or absence of hedging as a phenomenon is. If the model uses uncertain language about you, the signal is robust across different engines and strategies.

For your business this means just one thing: don’t fixate on the single word. What counts is the aggregate phenomenon across multiple queries and multiple engines. That’s your true trust score.

The mistakes I see most often

Looking only at ChatGPT. Models don’t have the same representation of your brand. An Umbrian manufacturer may appear assertive on Perplexity (which weighs the recent web more heavily) and vague on Claude (which struggles with local entities). Always test at least 3 engines.

Confusing narrative hedging with confidence hedging. “Probably your ideal company if you’re looking for organic cosmetics” is the model’s marketing language, not a diagnosis. “It’s probably based in Umbria” is a problem. Look at what the hedge is attached to: verifiable facts or recommendations?

Testing once, never again. Model indexing changes. A brand that’s assertive today may become uncertain after a training update. Repeat the test every 2-3 months.

Not recording the responses. Without screenshots or copy-paste, you can’t compare how things evolve over time. Keep a spreadsheet with date, engine, query, and the full answer.

What to do if you find systematic hedges

If the test reveals that the model doesn’t trust you, the path is to build explicit and redundant signals:

An “about us” page with verifiable structured data: year of founding, location, specialization, certifications. Use Google’s Rich Results Test to verify that the Organization schema is recognized.
A profile on Wikidata with the correct relationships (location, sector, products).
A consistent presence on authoritative third-party sources (chambers of commerce, trade associations, vertical magazines in the natural cosmetics sector).
Compare yourself with the 3-5 competitors the AI cites in your sector: which sources describe them assertively? You need to be there too.

In the research world, the authors also tested different prompting strategies to get the model to make its confidence explicit.

“For the baseline models, we test two prompting strategies: a rather typical brevity-inducing prompt (vanilla), which is also used for FUT models, and one that additionally asks for verbalised expressions of uncertainty (uncertainty / unc.); the exact prompts are provided in Sec. E.2.”

Eikema et al., 2025

Translated: you can explicitly ask the model to express its uncertainty. For the test on your brand, adding “state your level of confidence” or “tell me how sure you are about this answer” to the query is a useful technique for surfacing the hedge that otherwise stays latent in the language.

How it ties into visibility in AI responses

The common thread of this series is one: showing up in AI responses when someone is looking for a product or service like yours. Entity Confidence Testing is the periodic medical check-up. It doesn’t cure anything on its own, but it tells you whether the treatments you’re applying — work on the entity, on the knowledge graph, on author recognition — are producing the intended effect.

A brand the model describes with affirmative verbs is a brand that will be recalled when the user asks “best natural cosmetics manufacturers in Italy”. A brand full of hedge words will stay on the sidelines, even if the website is technically flawless.

In the next articles in this series I’ll take you inside the ongoing maintenance of the entity: how to monitor representation changes over time, how to handle disambiguation with namesakes, how to react when an AI engine starts confusing your brand with a competitor. These are the pieces that complete the picture of entity stewardship for AI.