If you're measuring your AI visibility by opening the browser, typing a question and checking whether you show up, you're collecting anecdotes — not data. The response variations you happen to see in that moment are one in thousands, and they tell you nothing about what your typical customer sees with their specific question. Automating tests across multiple platforms gives you a real snapshot of where you stand — and finally makes strategy decisions grounded in something solid.
I remember when, around 2012, those who could run SQL queries directly on their own Google Analytics database had an unfair advantage over competitors. While everyone else opened the dashboard and consulted pre-packaged reports, a handful of marketers downloaded raw data, cross-referenced it with their CRM and understood things the others couldn’t see. Same tool, different level of access, results on a different planet.
Today it’s the same story with the APIs of ChatGPT, Claude and Gemini: you need automation. If you measure your visibility in AI answers by opening the browser and typing prompts one at a time, you’re stopping at the first layer. It works for understanding how a single engine behaves on a single query. It doesn’t work for understanding patterns, seasonality, model drift, or differences between geography and industry.
Let me explain how to break through that ceiling, starting with a case I followed in an area I know well: the wineries of Monferrato.
Why 30 manual prompts aren’t enough to say “I’m visible”
When an entrepreneur writes to me “I tried asking ChatGPT about my brand and it never comes up”, the first question I ask is: how many times did you ask, with how many variations, across how many engines, on how many different days.
Nine times out of ten the answer is: “three or four prompts, on ChatGPT, in one afternoon”. You can’t decide anything on that sample. AI models generate probabilistic answers: the same question, asked twice an hour apart, can produce different lists of citations. A single query is one data point, not a trend.
The problem is that manually testing 50 prompt variations, across 4 engines, for 7 days, adds up to 1,400 calls. No human does that with a clear head. And this is where automation via API comes in.
What changes when you use APIs instead of the browser
The APIs of OpenAI, Anthropic, Google and Perplexity are programmatic interfaces: instead of opening chat.openai.com and typing, you send an HTTP request with your prompt and get back the response text in a structured format. Cost per call is under one cent for most base models.
The leap isn’t technological. It’s methodological. With a browser interface you think in terms of individual conversations. With APIs you think in terms of response datasets. Everything changes: you can compare, filter, aggregate, monitor over time.
In the world of SEO measurement in the 2010s, exactly this shift happened: first people looked at Google rankings one keyword at a time, then tools arrived that automatically monitored thousands of keywords and changed the trade. It follows that anyone still measuring AI visibility by hand today is doing SEO with the browser in 2010.
“Best producers of certified organic Grignolino DOC Piedmont 2024” is something nobody searches for.
The Monferrato case: wineries, Grignolino, Barbera and 480 prompts
For a few months I’ve been working with a small wine-consulting firm near Casale Monferrato (AL) that follows about ten wineries in the area. They all produce Grignolino and Barbera del Monferrato, some also Cortese and Freisa. They all have a website, product pages, a few mentions in wine guides, and well-maintained Google Business profiles.
The client’s question was simple: “when an enthusiast asks ChatGPT ‘best Grignolino wineries’ or ‘what to visit in Monferrato for wine’, do my producers come up or not?”.
To answer, I built a very simple Python script. Nothing sophisticated: a list of 60 prompts (variations on Grignolino, Barbera del Monferrato, wine tourism in the Alessandria area, food pairings, late harvest, organic wineries), a loop that sends them to the APIs of ChatGPT, Claude, Gemini and Perplexity, and an Excel sheet that saves each response with date, engine, prompt and full text. 60 prompts × 4 engines × 2 repetitions = 480 calls, completed in about two hours, total cost under 4 euros.
What came out of the dataset, in short:
- 6 Monferrato wineries appear at least once. The other 4 never, on any engine.
- On Perplexity the citations are concentrated on 3 sources: a national wine guide, a regional wine-tourism portal, Wikipedia. Anyone not cited there is invisible.
- On ChatGPT the answers are more generalist and often cite the protection Consortiums instead of individual producers.
- Geolocated prompts (“near Casale Monferrato”, “in the province of Alessandria”) completely change the list: some wineries appear only if the query is geographic.
An indicative test, not a scientific study. A sample of 60 prompts is a snapshot of a niche market, not a generalizable statistical base. Still, the pattern is clear enough to make operational decisions.
Choose 30 prompts that a real customer of yours might actually ask an AI.
What you too can automate in an afternoon
You don’t need to be a developer. You need to be clear about what you want to measure and hand the technical part to someone who can write a 50-line script. The minimum components:
- A list of prompts representative of your sector: 30-80 real queries a potential customer would ask an AI. Include geographic variations, intent variations (informational, comparative, transactional), and language variations.
- API keys for the engines that matter to you: ChatGPT/OpenAI, Claude/Anthropic, Gemini/Google, Perplexity. Cumulative cost for a monthly test: 5-30 euros depending on volume.
- A simple database to save the responses: even a Google Sheet works to get started. The key is that each row has date, engine, prompt, and the full response.
- An analysis routine: counting mentions of your brand, of competitors, of the sources cited by Perplexity. The sentiment of the context in which you appear (positive, neutral, comparative).
The real analysis, done properly, still requires professional tools and someone who knows how to read the dataset. What you get with a homemade script is the first level: understanding whether you’re visible and where you’re not. For the “why” you need other pieces of the puzzle, some of which I’ve covered in this series when I explained how AI recognizes author entities in named entity recognition and how implicit citations toward your domain are weighted in implicit reference weight.
The mistakes I see most often when SMEs try to measure by hand
- Too few prompts, a hasty decision. Three queries and they conclude “the AI hates me”. Three queries aren’t even enough to measure the temperature of the room.
- A single engine. Only ChatGPT, and then they discover that on Perplexity (where there are clickable citations and an audience more oriented toward comparison) the situation is different.
- Prompts written in agency-speak, not customer-speak. “Best producers of certified organic Grignolino DOC Piedmont 2024” is something nobody searches for. It’s what competitors search for to show off. The real customer types “light red wine Monferrato recommend”.
- No repetition over time. A single measurement says nothing about stability. Models update, Perplexity’s indexes change, the cited sources rotate. You need at least a monthly pass.
What to do Monday morning
- Choose 30 prompts that a real customer of yours might actually ask an AI. Have a customer write them, not you.
- Decide which engines to monitor. For most Italian SMEs: ChatGPT, Gemini, Perplexity are the bare minimum.
- Assign someone who can program 1-2 days to build the test script and the output sheet. Indicative cost is low, and the result is reusable for years.
- Compare your mentions with the 3-5 competitors the AI cites most often in your sector. That is your benchmark, not an abstract number.
- Rerun the test every month with the same list of prompts. That’s the only way to see whether what you’re doing is working.
Measure to know where to act
The whole thread of this series on how to measure visibility in AI answers leads here: you can’t improve what you don’t measure, and you can’t measure seriously with the browser. APIs exist for this, they cost little, and they give you the data foundation for decisions you’d otherwise make on gut feeling.
In the upcoming articles in this series I’ll go into detail on how to structure a continuous monitoring dashboard, how to compare share of voice between you and your competitors on AI engines, and how to correlate spikes in AI citations with the organic traffic landing on the site. These are the pieces that, put together, give you the full picture.