The AI doesn't decide on its own which category to place your company in: it infers it by looking at who gets cited alongside you. If your name only appears in generic contexts or next to brands from different sectors, then as far as the AI is concerned you don't belong to any specific category — and when someone looks for an expert in your field, you're not among the options. Building the right associations is a precise, repeatable strategy: once you're positioned next to the reference points of your sector, the AI starts citing you automatically in the questions that matter.
Ask ChatGPT “who are the leading digital marketing consultants in Italy?” and look at the list. The brands that appear together in that answer are co-cited — and within the model, they are connected in vector space. If your brand appears next to the sector leaders, the AI associates you with the same category. If you don’t appear, then as far as the AI is concerned you don’t exist in that category.
This is the last technical mechanism of Pillar 1, and it closes a precise journey: from the architecture of the model, to information retrieval, to reasoning, to training, to evaluation metrics. Co-citation is where everything converges — it’s the visible result of how the model learned to classify the world.
The technical principle: how co-citation patterns emerge from training
Within the training corpus, the model doesn’t just learn isolated meanings — it learns the relationships between entities. Brands, people, concepts: if they appear together across thousands of documents, their embeddings (vector representations) move closer together in the space.
The mechanism is one of token co-occurrence. As Zhou et al. (2024) document:
“An n-gram is a re-grouping of token-level sequences to measure the co-occurrences of tokens.”
— Zhou et al., 2024
Co-occurrences aren’t only about isolated words, but about sequences. If “Brand A” and “Brand B” appear in the same textual sequence across tens of thousands of documents, the model builds a vector association between the two. This association persists in the model — it was encoded during pre-training and reinforced during fine-tuning.
From this follows a structural effect: brands that frequently appear together form a category cluster in vector space. When a user asks “who are the best in sector X?”, the model retrieves the brands in that category’s cluster — not based on an explicit list, but because their embeddings are close to one another and to the concept of “sector X”.
Your AI positioning is not a property of your website. It’s a property of your co-citations.
Why third-party sources matter more than your own website
Here’s a point many people misunderstand: the model doesn’t weight all sources equally.
The GEO paper by Aggarwal et al. (2023) documents this directly:
“Including citations, quotations from relevant sources, and statistics can significantly boost source visibility.”
— Aggarwal et al., 2023
It’s not enough for your own site to mention your brand alongside the sector leaders. What counts are the co-citations on third-party sources: industry articles, tool comparisons, professional directories, interviews, mentions in vertical media. Those are the co-citations that build the association in the training data — not self-description.
A research group in 2025, in “Generative Engine Optimization: How to Dominate AI Search”, clarifies why this is structural, not accidental:
“AI Search exhibits a systematic and overwhelming bias towards Earned media — third-party, authoritative sources.”
The bias toward Earned media isn’t a bug — it’s the result of deliberate design. Models are trained and optimized to prefer independent sources over self-declarations. This means that a co-citation on your own site carries marginal weight. A co-citation on an authoritative, third-party source carries substantial weight.
The practical consequence is that a co-citation strategy is not a traditional content marketing strategy. It’s a PR and Earned media strategy with a precise technical goal: building the vector associations the AI learned during training.
There’s also an interaction with the metrics analyzed in previous weeks: a co-citation on a high-perplexity source contributes little, because the resulting vector associations are less stable. A co-citation in a structured comparative article with verifiable data — one that passes the BLEU/ROUGE and TruthfulQA filters — maximizes both the vector association and the log-probability of your brand within the model.
A brand may have an excellent website and outstanding content — but if it never appears in articles that also cite the sector leaders, then as far as the model is concerned it doesn’t belong to that category.
What this concretely means for your AI visibility
Three direct effects, all demonstrable:
Who appears alongside you builds your category. If you’re co-cited with the leading email marketing tools, your embedding shifts toward the “email marketing” cluster. If you’re co-cited with generic marketing blogs, your embedding stays generic. The cluster you belong to isn’t the one you claim — it’s the one your co-citations build.
The absence of co-citation is category invisibility. A brand may have an excellent website and outstanding content — but if it never appears in articles that also cite the sector leaders, then as far as the model is concerned it doesn’t belong to that category. This explains why unknown brands surface in AI answers while well-known competitors disappear: the former have co-citation with the cluster, the latter don’t.
Negative co-citations are harmful. A brand frequently mentioned alongside low-quality sources, in critical contexts, or in unrelated sectors, builds vector associations that pull it away from the target cluster. Monitoring co-citations isn’t just a brand reputation exercise — it’s a technical variable for AI visibility.
Build explicit co-citation on third-party sources.
What to do concretely
Identify your target cluster. Ask the leading AI engines this question: “who are the leading [your category] in [your market]?” The brands that appear form the cluster you need to reach. Run the same query on ChatGPT, Perplexity and Gemini — the cluster is stable if it appears on all three.
Build explicit co-citation on third-party sources. Every external mention of your brand is an opportunity to build co-citation. When you earn a mention — press release, guest post, interview, directory — make sure the context also includes the cluster’s brands: “Like [Leader A] and [Leader B], [your brand] also operates in [category].” This isn’t self-promotion — it’s category engineering.
Seek placement in comparative articles. Articles such as “Top 10 [sector]”, “Best [tool] of 2025”, “Comparison between [tool A] and [tool B]” are the formats with the highest impact on co-citation. Being included in these articles on authoritative sources is the direct path into the cluster. The source must be Tier 1-2: vertical media, industry publications, independent reviews.
Produce first-person comparative content. Even though it carries less weight than third-party sources, write articles that compare your service with the sector leaders — in an informative, non-disparaging way: “How to choose between [Leader A], [Leader B] and [your brand]: differences, use cases, selection criteria.” You’re building co-citation on a source you control, which feeds the AI the optimal structure when a user runs a direct comparative query between you and the leader.
Monitor toxic co-citations. If your brand appears in negative contexts or with low-quality brands, the vector association is active — and harmful. The absence of signal is preferable to negative signal.
How to check your current situation
Four practical checks, to be done in order:
- Direct category test: ask ChatGPT “who are the leading [your category] in [your market]?” Does your brand appear? Which other brands are you listed with?
- Co-occurrence test: search for your brand on Google together with the names of three leaders in your sector. Are there pages that mention you all together? On which sources?
- AI category test: ask Perplexity “compare [Leader A] with [Leader B] for [use case]”. Does your brand appear as an alternative or as a point of comparison?
- Cluster distance test: ask ChatGPT “which brands are similar to [your brand]?” and “which brands are similar to [Leader in your sector]?” Do the lists overlap? This is a rough measure of the vector distance between your embedding and the cluster.
If you don’t appear in tests 1 and 3, you don’t have enough co-citation with the cluster. If there are no results in test 2, you don’t have co-citation on authoritative third-party sources. Repeat the tests every three months: the training data is updated, and your position in the cluster changes.
The circle closes: where we started
This pillar started from architecture — from tokens, embeddings, the attention mechanism. Then it analyzed retrieval: how the model finds and ranks sources. Then reasoning: how it builds coherent answers. Then training: how it learned to prefer certain sources and penalize others. Finally the metrics: the technical signals that measure the quality of your content in the model’s eyes.
Co-citation patterns are where everything comes together. Your brand exists in the AI’s answer because its embedding is close to the category cluster — a proximity built by co-citations on authoritative sources, not by content published on your own site.
Building on this foundation requires authority: how your brand is assessed independently of your own claims, how third-party sources represent you, how you build a reputation that models recognize as trustworthy. That’s the theme of Pillar 2: Authority and Credibility for AI. From here, the work is no longer only about content — it’s about who talks about you, how they do it, and why the model should believe them.