If the AI cites you a hundred times a month, it's likely that 80% of those citations come from two or three specific sources — and if you don't know which ones, you're spreading your communication budget across everything without knowing what actually works. Mapping the sources that drive your AI citations turns your strategy from spraying into the crowd to investing where there's already a measurable return.
Of the 150 AI citations collected on your brand over the last 6 months, 3 sites generate 60% of them. Do you know which ones? Mapping the sources is the foundation of everything else.
I’m telling you this because it’s the pattern I see repeating almost boringly every time I open a client’s tracking sheet. The distribution of the sources feeding AI answers about your brand isn’t uniform: a few pages do the bulk of the work, and almost always the brand owner doesn’t even know which ones they are.
In this article I’ll explain how to build this map, why it’s worth every minute you put into it, and what to do with it once you have it in front of you.
What I mean by the source of an AI citation
When Perplexity, ChatGPT with browsing, or Gemini answer a question that concerns you, URLs appear beneath the answer (or linked within it). Those are the grounding sources: the web documents from which the model extracted the information it’s reporting.
The source map is simply the ordered list of those links, collected over an observation period and grouped by domain, author, and content type. It’s a mechanical exercise; it doesn’t require a degree in data science. But it changes everything you do afterward.
Why it’s the prerequisite for any serious measurement
In the previous articles in this series you saw how a citation is counted, how to distinguish a brand mention from a citation with a link, how to track share of voice in AI answers. All of that is downstream. The source map is upstream: it tells you where the signal comes from.
Without the map, you’re counting the points in the championship without knowing which players scored them. You’re measuring an aggregate result you can neither replicate nor defend. With the map, instead, every citation becomes actionable: you know which page generated the signal, and therefore you know where to reinforce.
The thread is the same as always in my articles: you care about showing up in AI answers, and to show up more you need to understand where you’re already showing up from right now.
The fourth is not comparing yourself against the 3-5 competitors the AI cites in your sector.
The underlying principle: the models don’t pick at random
In the world of research on Retrieval-Augmented Generation (RAG) systems, the documented mechanism is clear: AI models select grounding sources based on signals of authority, freshness, semantic alignment with the query, and — increasingly — author entity recognition and implicit reference weight. From this it follows that, for your business, the sources picked up about your brand are not random: they are the subset of your web assets (and of those that talk about you) that the system considers most reliable in that context.
The operational consequence is sharp. If you map the sources, you see which assets the system has already “validated.” Those are your thoroughbreds: they need to be updated, expanded, internally linked. Everything else is maintenance spending.
Double down on your top URLs: the page that generates the most AI citations in your domain is your flagship asset.
The test you can run in 60 minutes
You need a Google Sheet with 5 columns: date, query, AI engine, source URL, source domain.
Procedure:
- Open Perplexity (it’s the one that always gives you the links; the others don’t always). Run 15-20 realistic queries about your sector in which you’d expect to be able to appear. Examples for a producer of organic extra virgin olive oil in Umbria: “best organic extra virgin olive oil Umbria”, “cold-press mills Trasimeno”, “EVO oil corporate gift central Italy”.
- For every answer that cites you or cites a competitor, copy all the source URLs into the sheet.
- Repeat on ChatGPT with browsing enabled and on Gemini, using the same queries.
- At the end of the week, group by domain. Count the occurrences.
A binary reading threshold: if your domain appears in less than 30% of the answers concerning your sector, you have an editorial coverage problem, not a technical SEO one. If it appears in 30-60% you’re on the right track and it needs to be defended. Above 60% you’re already an authority recognized by the system in that semantic field of queries.
An honest entry-level caveat: this is a first manual step. The real analysis, on large volumes and with longitudinal tracking, requires professional tools. But the first step is enough to tell you whether you’re investing in the right direction.
The test I ran myself: 6 months on Adriatic seaside hotels
To write this series I kept a longitudinal observation over 6 months on a sample of seaside hotels on the central Adriatic, in particular a 4-star boutique hotel in Fano (PU) and five of its direct competitors between Senigallia, Marotta, and Pesaro.
Every two weeks I ran the same set of 12 queries (such as “best hotel Fano sea view”, “family hotel private beach Marche”, “boutique hotel Adriatic for couples”) on Perplexity, ChatGPT with browsing, and Gemini. I collected all the cited sources and mapped them by domain.
The pattern that emerged, across an overall sample of about 480 answers collected:
- 3 domains made up 58% of the citations: the Fano hotel’s own site, a regional tourist guide for the Marche area, and an in-depth page from a 2022 national travel magazine.
- The sites of tour operators and aggregators appeared often, but were almost never cited as a primary source on the “boutique” or “experience” queries; they came in on the purely transactional queries.
- A hotel blog page published in 2023 about excursions in San Bartolo Park was the single most cited URL of all, by itself.
Stated limitations: not a large sample, a single geographic area, only one seasonality fully covered (autumn-winter), AI engines in constant evolution during the test period. An indicative pattern, not a peer-reviewed study. But the signal was clear enough to make me, together with the client, rethink the entire editorial strategy: double down on articles modeled on “San Bartolo excursion” and abandon two thematic threads that didn’t produce a single citation in 6 months.
The mistakes I see most often when I start from scratch with a client
There are four that recur almost predictably.
The first is not mapping at all: you count the total number of citations month over month and stop there. The number grows, we celebrate; the number drops, we worry. Without knowing where those citations come from, you’re working blind.
The second is mapping only your own domain. The citations that concern you often come through third parties: an industry guide, a Wikipedia entry, an old article from a local newspaper. Those absolutely need to be mapped, because they tell you where your authority has been built by others (and where it’s worth strengthening the relationship).
The third is ignoring dated sources. I regularly find articles from 2018-2020 still heavily cited by AI models in 2026. They’re assets of dormant value: updating the content (with the publisher’s consent or with a new article that references it) is one of the highest-ROI operations of all.
The fourth is not comparing yourself against the 3-5 competitors the AI cites in your sector. If on your semantic field of queries the AI always cites three competitors and never you, the map of those three tells you exactly where you’re missing editorial presence.
What to do with the map once you have it
Three concrete actions, in order of priority:
- Double down on your top URLs: the page that generates the most AI citations in your domain is your flagship asset. Update it every 2-3 months, extend the content, add FAQs, reinforce the internal links pointing to it.
- Reclaim third-party sources: if a guide or a magazine cites you and is well positioned as an AI source, build a stable editorial relationship. A collaboration that produces 2 articles a year on that outlet is worth more than 20 guest posts on sites the AI doesn’t consider.
- Audit the pages that are “dead” for AI: your pages that never appear as a source. Decide: do they deserve an editorial relaunch (rewrite, inverted pyramid, recognizable author)? Or are they service pages that are fine as they are? Deciding explicitly is already half the work.
Where this map fits into the rest of your measurement
The source map is the starting point. Once you have it, you can tackle the rest of the measurement work I see on your AI visibility dashboard: tracking share of voice over time, attributing conversions to AI citations, benchmarking against competitors. Without the map, everything else remains a number without a cause. With the map, every metric ties back to a concrete editorial action.
In the following articles in the series I’ll take you further with longitudinal tracking of share of voice, the attribution model I use to link AI citations to qualified inquiries, and the setup of a monthly dashboard readable by a business owner in 5 minutes.