How AI engines think

Is AI inventing things about your brand? It happens when it can’t find reliable data

Roberto Serra 25 June 2026·~10 min read

Try right now to ask ChatGPT about your company: you might discover that the AI is telling your potential clients that you offer services you don't have, that you're based in a city where you don't operate, or that you were founded in the wrong year. AI invents when it can't find reliable data — and this false information is read every day by people who were considering contacting you. The damage is silent but constant. There are precise ways to correct this information and make the AI talk about you accurately.

You’ve already done this test. You opened ChatGPT, typed your company name, and read the answer. And something didn’t add up. A service you don’t offer. A location you never had. A wrong founding year. Maybe the founder’s name mangled, or a phone number that was never yours.

In that moment you thought: it’s a system error, it rarely happens, it doesn’t really matter.

It matters. Here’s why.

Every day, your potential clients ask the same question to ChatGPT, Perplexity, Gemini. They receive the same wrong answer. They don’t know it’s wrong — the AI doesn’t warn them, doesn’t add a footnote, doesn’t distinguish between what it knows and what it invented. They form an impression. They make decisions. And you’re not in the room when it happens.

This phenomenon is called hallucination. It’s one of the most studied structural problems in language model research, and for your brand it has concrete consequences that no communication campaign can correct in real time.

Why models invent instead of admitting they don’t know

Minaee et al. (2025) document it directly in their studies on LLM reliability:

“Hallucination is one of the important factors in measuring how much a large language model is trustworthy.” — Minaee et al., 2025

A model’s trustworthiness is also measured by its ability not to invent. The problem is structural: models aren’t designed to admit uncertainty, they’re designed to generate plausible answers. When they have abundant and consistent data on a topic, the answer is probably correct. When data is scarce, the model fills the gaps with statistical patterns — it generates information that “sounds” correct because it resembles what it knows about similar entities.

For a large brand present in thousands of documents in the training data, hallucinations are rare: the model has too many cross-confirmations to get basic data wrong. For a smaller brand, an SMB, a professional, a niche business — hallucinations are frequent, because the model has few reference points and uses them to reconstruct an image that is plausible but not necessarily true.

This is not a flaw that will be corrected in the next version of the model. It’s a mathematical consequence of how text generation works: the model doesn’t have a list of “things I don’t know” — it only has probability distributions over tokens. When data is missing, that distribution still converges toward something.

The forms hallucinations take about your brand

They’re not all the same. Some are easy to identify, others almost invisible.

Invented data are the most obvious: wrong address, nonexistent phone number, incorrect founding year, made-up employee count. The model reconstructed them by analogy with other companies in your sector and of your perceived size.

Phantom services are more insidious. The model attributes services to you that you don’t offer, because they exist in your industry, because you wrote about that topic on your blog, because a partner of yours offers them and the two brands are often mentioned together. The potential client looking for that service calls you. They find out you don’t do it. They wonder why the AI gave them wrong information. Maybe they no longer trust you.

Confusion with namesakes is common: if there’s another brand with a name similar to yours, the information gets mixed up. The model doesn’t do a lookup — it builds an aggregate representation of everything it has read, and two entities with similar names tend to overlap.

Incorrect extrapolation is subtle: you published an article that mentions “artificial intelligence” and now the model describes you as a company that offers AI services. You weren’t saying that — you were analyzing the topic — but the model inferred an association and crystallized it.

Invented quotes are the most serious: the model can generate opinions, reviews, or statements attributed to your brand that you never made. They don’t exist anywhere — they’re linguistic patterns projected onto you. If someone takes them as real, the consequences are hard to contain.

Common mistake

The model attributes services to you that you don’t offer, because they exist in your industry, because you wrote about that topic on your blog, because a partner of yours offers them and the two brands are often mentioned together.

The connection with the knowledge cutoff

These two dynamics — hallucination and knowledge cutoff — are directly connected. As explained in the article on the knowledge cutoff and the obsolescence of information about your brand, the model has a date beyond which it can’t see. But the problem isn’t just obsolescence.

The problem is that the cutoff creates gaps. And the gaps get filled.

You opened a new location six months ago. You discontinued a service. You changed market focus. You have a new CEO. The model doesn’t know — that information wasn’t yet in the training corpus at the time of the cutoff. And when someone asks, it doesn’t answer “I don’t know”: it answers with what it had, integrated with what it expects a company like yours should have.

The result can be more wrong than the obsolete information it had. Old information is at least a real snapshot of a past moment. Information built by analogy doesn’t correspond to any real moment.

Pro tip

Complete Organization schema on the site: official name, address, phone, website, founding year, CEO, sector, services.

Why grounding is the structural answer

Gao et al. (2024), in one of the most cited papers on RAG architecture, identify hallucination as one of the systematic risks of AI generation:

“In generating responses, the model may face the issue of hallucination, where it produces content not supported by the retrieved documents.” — Gao et al., 2024

Note the wording: “not supported by the retrieved documents”. When the model has retrieved documents that support the claim, hallucination decreases. When the documents are missing or insufficient, the model goes ahead anyway — but with no real basis.

This is the mechanism that the article on grounding and attributed citation explains in detail: your content must be structured so that the model can anchor to it. Specific data, verifiable claims, information that the model couldn’t generate on its own. Every groundable piece of data you publish is an anchor point that reduces the space available for invention.

The logic is simple: the model invents when it has no data. The solution isn’t to ask the model not to invent — it’s to provide abundant, consistent, structured data across multiple channels. Less room for uncertainty means fewer hallucinations.

The systemic risk in multi-agent contexts

There’s a dimension of the problem that goes beyond the single chatbot. Xu et al. (2026), analyzing multi-agent AI systems, identify a propagation risk:

“These interactions breach traditional trust boundaries, where localized malicious inputs or model hallucinations can propagate through the system.” — Xu et al., 2026

Hallucinations don’t stay confined. In an ecosystem where different AI systems consult each other, aggregate, and use one another as sources, wrong information can propagate. A model that has generated a hallucination about your brand can become a source — direct or indirect — for other systems. The problem doesn’t stay isolated within the single exchange.

It follows that periodic monitoring isn’t enough: a hallucination corrected today may well resurface tomorrow because a different system has it in its context. The lasting answer is to reduce the probability of hallucination at the source, providing reliable data that all systems can rely on.

How to monitor hallucinations about your brand

The first step is to know what the AI says about you right now. Don’t trust your memory — run the test.

Open ChatGPT, Gemini and Perplexity. For each one, ask these questions and note every piece of information that doesn’t match reality:

What is [your brand] and what does it do?
Where is [your brand] located?
What services does [your brand] offer?
Who founded [your brand] and when?
How much does [your main service] cost?
What are the opinions about [your brand]?
Who are the typical clients of [your brand]?
Who are the competitors of [your brand]?

Every wrong piece of information is a hallucination. For each one, do a second analysis: does a source with that wrong information exist somewhere on the web, or did the model build it from scratch? If a wrong source exists, correcting it is a priority. If it was built from scratch, the problem is the scarcity of reliable data.

This is a service I handle with the clients I work with on AI visibility monitoring: the periodic mapping of active hallucinations is the starting point for building a correction strategy based on real data, not assumptions.

What to actually do to reduce hallucinations

It isn’t solved with a single action. It’s solved by saturating the information space with reliable data.

Complete Organization schema on the site: official name, address, phone, website, founding year, CEO, sector, services. Every filled-in field is a data point the model can anchor to. Leaving it empty is an invitation to invention.
Updated Google Business Profile: it’s one of the structured sources that AI systems consult most frequently. Address, hours, categories, description — updated and consistent with the site.
Thorough About Us page: not three lines. A page with history, team, services with specific descriptions, real numbers (clients, years of activity, sectors). This page must answer every question a user would ask about your brand’s identity.
Consistent presence across multiple sources: company LinkedIn, industry directories, Crunchbase if applicable. Consistency across multiple sources is the strongest signal against hallucination: the model cross-checks, finds convergence, has no reason to invent.
Regular publication of verifiable data: as analyzed in the article on grounding, content with original data, documented methodologies, and specific results is what the model prefers to anchor to. It’s also what leaves the least room for invention.

The check you need to do this week

Run the 8-question test on the three main AI systems. Document the errors. Classify each one: missing structural data (schema, GBP), wrong sources to correct, information gaps to fill with new content.

Then tackle the most urgent. Not everything at once — identify which hallucination has the greatest impact on a potential client evaluating your brand, and start there.

The hallucination problem is never definitively solved because models update, change, and include new sources. But it can be managed: with periodic monitoring, always-updated structured data, and a multi-channel presence strategy that structurally reduces the space available for invention.

Those who understand this have an advantage over those who wait for the problem to solve itself. It doesn’t solve itself. It gets managed — and it gets managed before the damage is already done.

To explore how chain-of-thought reasoning in AI engines influences which sources get selected, or how tool use changes the probability of hallucination in systems with access to live data, the next articles in this series get into the operational mechanics. To understand how multi-step planning and multi-turn conversations can amplify or contain the problem, there are two more levels of analysis that change the way you think about AI visibility.