Every time ChatGPT answers a question in your industry, your brand name has a precise probability of appearing — and that probability is measurable and modifiable. The brands that have pushed it high get cited almost automatically, as if they were the obvious answer; the others are never named, not even when they would be the best choice. Working on those signals methodically is possible right from the start, and every step taken today strengthens your presence in tomorrow's answers.
There’s a difference between being cited by AI and being generated by AI. Being cited means the model found your name in some source and reported it. Being generated means something far deeper: your brand has such a high probability of being the right next token that the model produces it almost mechanically, without looking for it anywhere.
This is the level at which the dominant brands in every industry operate. And it’s attainable, if you understand the mechanism that governs it.
The physical mechanism: how the model chooses words
To understand log-probability, you need to understand how text generation works at the technical level. It’s not a mysterious operation — it’s applied math.
As Minaee et al. (2025) document in their survey on Large Language Models, the process begins the moment you submit a query: “Given an input prompt, the tokenizer translates each token into a corresponding token ID. Decoding refers to the process of text generation using pre-trained LLMs.”.
Each word or word fragment is converted into a numeric ID. From that moment on, the model operates only on numbers. It doesn’t “read” the text in the human sense — it computes.
And at the end of each computation, as the same study clarifies: “the model generates logits, which are converted to probabilities using a softmax function.” The logits are raw scores assigned to every possible next token in the vocabulary. The softmax turns them into probabilities that sum to 1. The token with the highest probability is selected — or one among the most probable, depending on the sampling strategy.
The log-probability is the logarithm of this probability. The logarithm is used for computational reasons (summing logarithms is more stable than multiplying small probabilities), but the concept stays identical: the higher your brand’s log-probability in a given context, the more likely it is to be generated when the model answers a query in that industry.
Why this is the real KPI of AI visibility
When a user asks “what’s the best email marketing tool?”, the model reaches the point where it has to generate a name. The candidates are all the brands that have some association with that category in the training data. But the probabilities aren’t equal — they’re the direct result of how many times and in what contexts each brand was associated with that category in the sources the model was trained on.
Mailchimp has thousands of consistent associations with “email marketing” in the training data. Its log-probability for that context is high. An unknown tool has few or zero associations: its log-probability is very low, close to zero.
The model doesn’t “choose” — it computes. And it generates the token with the highest probability.
It’s worth noting how the sampling strategy influences which token actually gets selected. Minaee et al. describe top-k sampling:
“Top-k sampling is a technique that uses the probability distribution generated by the language model to select a token randomly from the k most likely options.”
With high temperature (as we saw in the article on Temperature), the model samples with more variability — so even tokens with average probability can occasionally emerge. With low temperature, the token with the highest log-probability almost always wins. In both cases, however, having a high log-probability is the necessary condition for appearing.
If your website says “SEO consultant,” LinkedIn says “digital strategist,” and the industry directory says “web agency,” you’re spreading the probability across three different categories.
What builds your brand’s log-probability
Your brand’s log-probability for a category depends on a single variable: how many times and in what contexts your brand was associated with that category in the training data.
It’s not a value you can influence directly — you don’t have access to the training. But you can influence the web from which the training is extracted. Every authoritative source that writes “[your brand] is a [your category] tool” adds an association. Every mention in a relevant context shifts the probability distribution in your favor.
From this it follows that the strategy for increasing log-probability isn’t writing a single excellent piece of content — it’s building a frequent, consistent, and distributed association across multiple sources over time.
Three factors amplify the effect:
Message consistency. If your website says “SEO consultant,” LinkedIn says “digital strategist,” and the industry directory says “web agency,” you’re spreading the probability across three different categories. The model doesn’t build a strong association for any of the three. Concentrate everything on one precise association: “[your brand] + [your category] + [your specific target].”
Volume of sources. Your website alone isn’t enough. You need your website, Google Business, LinkedIn, industry directories, vertical media, Wikipedia if relevant, press releases, bios on platforms, marketplace profiles, citations in third-party articles. Every source that replicates the same association is a vote in favor of your log-probability.
Proximity to dominant brands. If your brand appears in the same context as brands with high log-probability — “like Mailchimp, Brevo, and [your brand], email marketing tools allow you to…” — the model builds a category association that raises your probability even for generic queries. It’s not a trick: it’s how training on real data works.
Concentrate everything on one precise association: “[your brand] + [your category] + [your specific target].”
How to measure where you are now
Before building, measure. The test is simple and requires no tools:
- Open 10 fresh conversations with ChatGPT (not the same one — each conversation is an independent sampling)
- In each, ask: “what’s the best [your category]?” or “who are the leading [your category] in Italy?”
- Count how many times your brand appears
The result is a rough estimate of your relative log-probability:
- 0/10: your log-probability is nearly zero — the model hasn’t built the association
- 3-4/10: you’re in the middle band — the association exists but is weak
- 7-10/10: you’re in the high band — the association is solid and the model generates you automatically
Run the same test on your main competitor. The gap between their score and yours is the log-probability gap you need to close. If they’re at 8/10 and you’re at 0/10, it’s not solved with one piece of content — it’s solved with a systematic plan for building associations.
The operational plan
The concrete target is to bring the “[your brand] + [your category]” association onto at least 15 distinct, authoritative sources, spread over a span of 6-12 months.
- Month 1-2: define the precise target association. One phrase, not variants. Update it on your website, Google Business, LinkedIn, all existing directories.
- Month 3-6: build the external sources. Guest posts on vertical media, press releases, citations in other authors’ articles, profiles on marketplaces and aggregators. Each piece must contain the explicit association — not just the brand name, but the brand + the category.
- Month 6-12: maintain the cadence. Log-probability is built through repetition over time, not through an initial burst. An external editorial plan with at least 2 new sources a month is the right approach.
Measure each quarter with the 10-conversation test. The goal isn’t perfection — it’s shifting the distribution in your favor relative to competitors.
The key point
The model has no opinions about your brand. It has probabilities. And probabilities are built with data — specifically, with the frequency and consistency with which your brand appears associated with your category in the sources the model is trained on.
This is the physical mechanism that separates the brands AI generates automatically from those it never cites. It’s not luck, it’s not an opaque algorithm — it’s a probability distribution you can shift.
Start with the 10-conversation test today. What you discover is your real starting point.