Measuring AI visibility

Manual Tests on ChatGPT, Claude and Perplexity: The Invisible Ceiling That Blocks You

Roberto Serra 25 June 2026·~7 min read

If you're measuring your AI visibility by opening the browser, typing a question and checking whether you show up, you're collecting anecdotes — not data. The response variations you happen to see in that moment are one in thousands, and they tell you nothing about what your typical customer sees with their specific question. Automating tests across multiple platforms gives you a real snapshot of where you stand — and finally makes strategy decisions grounded in something solid.

I remember when, around 2012, those who could run SQL queries directly on their own Google Analytics database had an unfair advantage over competitors. While everyone else opened the dashboard and consulted pre-packaged reports, a handful of marketers downloaded raw data, cross-referenced it with their CRM and understood things the others couldn’t see. Same tool, different level of access, results on a different planet.

Today it’s the same story with the APIs of ChatGPT, Claude and Gemini: you need automation. If you measure your visibility in AI answers by opening the browser and typing prompts one at a time, you’re stopping at the first layer. It works for understanding how a single engine behaves on a single query. It doesn’t work for understanding patterns, seasonality, model drift, or differences between geography and industry.

Let me explain how to break through that ceiling, starting with a case I followed in an area I know well: the wineries of Monferrato.

Why 30 manual prompts aren’t enough to say “I’m visible”

When an entrepreneur writes to me “I tried asking ChatGPT about my brand and it never comes up”, the first question I ask is: how many times did you ask, with how many variations, across how many engines, on how many different days.

Nine times out of ten the answer is: “three or four prompts, on ChatGPT, in one afternoon”. You can’t decide anything on that sample. AI models generate probabilistic answers: the same question, asked twice an hour apart, can produce different lists of citations. A single query is one data point, not a trend.

The problem is that manually testing 50 prompt variations, across 4 engines, for 7 days, adds up to 1,400 calls. No human does that with a clear head. And this is where automation via API comes in.

What changes when you use APIs instead of the browser

The APIs of OpenAI, Anthropic, Google and Perplexity are programmatic interfaces: instead of opening chat.openai.com and typing, you send an HTTP request with your prompt and get back the response text in a structured format. Cost per call is under one cent for most base models.

The leap isn’t technological. It’s methodological. With a browser interface you think in terms of individual conversations. With APIs you think in terms of response datasets. Everything changes: you can compare, filter, aggregate, monitor over time.

In the world of SEO measurement in the 2010s, exactly this shift happened: first people looked at Google rankings one keyword at a time, then tools arrived that automatically monitored thousands of keywords and changed the trade. It follows that anyone still measuring AI visibility by hand today is doing SEO with the browser in 2010.

Common mistake

“Best producers of certified organic Grignolino DOC Piedmont 2024” is something nobody searches for.

The Monferrato case: wineries, Grignolino, Barbera and 480 prompts

For a few months I’ve been working with a small wine-consulting firm near Casale Monferrato (AL) that follows about ten wineries in the area. They all produce Grignolino and Barbera del Monferrato, some also Cortese and Freisa. They all have a website, product pages, a few mentions in wine guides, and well-maintained Google Business profiles.

The client’s question was simple: “when an enthusiast asks ChatGPT ‘best Grignolino wineries’ or ‘what to visit in Monferrato for wine’, do my producers come up or not?”.

To answer, I built a very simple Python script. Nothing sophisticated: a list of 60 prompts (variations on Grignolino, Barbera del Monferrato, wine tourism in the Alessandria area, food pairings, late harvest, organic wineries), a loop that sends them to the APIs of ChatGPT, Claude, Gemini and Perplexity, and an Excel sheet that saves each response with date, engine, prompt and full text. 60 prompts × 4 engines × 2 repetitions = 480 calls, completed in about two hours, total cost under 4 euros.

What came out of the dataset, in short:

6 Monferrato wineries appear at least once. The other 4 never, on any engine.
On Perplexity the citations are concentrated on 3 sources: a national wine guide, a regional wine-tourism portal, Wikipedia. Anyone not cited there is invisible.
On ChatGPT the answers are more generalist and often cite the protection Consortiums instead of individual producers.
Geolocated prompts (“near Casale Monferrato”, “in the province of Alessandria”) completely change the list: some wineries appear only if the query is geographic.

An indicative test, not a scientific study. A sample of 60 prompts is a snapshot of a niche market, not a generalizable statistical base. Still, the pattern is clear enough to make operational decisions.

Pro tip

Choose 30 prompts that a real customer of yours might actually ask an AI.

What you too can automate in an afternoon

You don’t need to be a developer. You need to be clear about what you want to measure and hand the technical part to someone who can write a 50-line script. The minimum components:

A list of prompts representative of your sector: 30-80 real queries a potential customer would ask an AI. Include geographic variations, intent variations (informational, comparative, transactional), and language variations.
API keys for the engines that matter to you: ChatGPT/OpenAI, Claude/Anthropic, Gemini/Google, Perplexity. Cumulative cost for a monthly test: 5-30 euros depending on volume.
A simple database to save the responses: even a Google Sheet works to get started. The key is that each row has date, engine, prompt, and the full response.
An analysis routine: counting mentions of your brand, of competitors, of the sources cited by Perplexity. The sentiment of the context in which you appear (positive, neutral, comparative).

The real analysis, done properly, still requires professional tools and someone who knows how to read the dataset. What you get with a homemade script is the first level: understanding whether you’re visible and where you’re not. For the “why” you need other pieces of the puzzle, some of which I’ve covered in this series when I explained how AI recognizes author entities in named entity recognition and how implicit citations toward your domain are weighted in implicit reference weight.

The mistakes I see most often when SMEs try to measure by hand

Too few prompts, a hasty decision. Three queries and they conclude “the AI hates me”. Three queries aren’t even enough to measure the temperature of the room.
A single engine. Only ChatGPT, and then they discover that on Perplexity (where there are clickable citations and an audience more oriented toward comparison) the situation is different.
Prompts written in agency-speak, not customer-speak. “Best producers of certified organic Grignolino DOC Piedmont 2024” is something nobody searches for. It’s what competitors search for to show off. The real customer types “light red wine Monferrato recommend”.
No repetition over time. A single measurement says nothing about stability. Models update, Perplexity’s indexes change, the cited sources rotate. You need at least a monthly pass.

What to do Monday morning

Choose 30 prompts that a real customer of yours might actually ask an AI. Have a customer write them, not you.
Decide which engines to monitor. For most Italian SMEs: ChatGPT, Gemini, Perplexity are the bare minimum.
Assign someone who can program 1-2 days to build the test script and the output sheet. Indicative cost is low, and the result is reusable for years.
Compare your mentions with the 3-5 competitors the AI cites most often in your sector. That is your benchmark, not an abstract number.
Rerun the test every month with the same list of prompts. That’s the only way to see whether what you’re doing is working.

Measure to know where to act

The whole thread of this series on how to measure visibility in AI answers leads here: you can’t improve what you don’t measure, and you can’t measure seriously with the browser. APIs exist for this, they cost little, and they give you the data foundation for decisions you’d otherwise make on gut feeling.

In the upcoming articles in this series I’ll go into detail on how to structure a continuous monitoring dashboard, how to compare share of voice between you and your competitors on AI engines, and how to correlate spikes in AI citations with the organic traffic landing on the site. These are the pieces that, put together, give you the full picture.

Chapter 7 · Measuring AI visibility

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

7.1 Competitive Benchmarking 8 deep dives

Competitor AI Audit: how to reverse engineer your rivals’ AI visibility The competitors winning in AI answers share 3 sources you’re missing Gap Analysis by Query Cluster: the 30 queries that separate you from your competitor are your editorial plan for the next 6 months New entrant detection: how to discover the competitors AI is starting to cite before you do Reverse engineering the competitor the AI cites most: how to turn their pattern into your map Industry Benchmark for AI Visibility: The Number That Gives Meaning to Your Share of Voice Seasonal AI Visibility Pattern: reading the cycles so you don’t mistake seasonality for a problem Bilingual AI visibility: why AI cites you in Italian but you vanish in English (or vice versa)

7.2 KPIs & Metrics 8 deep dives

AI Confidence Indicator: read how much the AI trusts you from the language it uses AI platform visibility: why an aggregate average leads your investment astray AI Referral Traffic: the only AI metric you can already see today in Google Analytics AI Share of Voice: the metric that’s replacing market share AI Mention Sentiment: How the AI Cites You Matters More Than How Often AI Citation Accuracy Rate: How Often AI Tells the Truth About Your Brand AI Recommendation Position: your spot in the AI list is the new ranking Query Coverage Rate: the metric that tells you how often AI really recommends you

7.3 Reporting & Dashboard 8 deep dives

Monthly AI Visibility Scorecard: one page, six numbers, three months of trend Competitive Comparison Matrix: Winning on One AI Platform Isn’t Enough Anymore Which Sources the AI Cites You From: The Map That Tells You Where to Invest Quarterly Trend Analysis: how to tell if your AI visibility is truly growing Hallucination Tracking Report: turning the AI’s mistakes about your brand into data you can manage The format that unlocks the budget for AI visibility The AI visibility report that shifts the conversation with your client AI Alert: set up now the system that warns you when your brand disappears from the answers

7.4 ROI & Business Impact 8 deep dives

AI lead attribution: how to know how many customers really come from ChatGPT and Perplexity When Google Ranking Drops and AI Visibility Rises: the Signal You Must Learn to Read Cost per AI Mention: What It Really Costs You Every Time AI Recommends You What AI Visibility Level Are You At: The Maturity Model Your CEO Understands AI visibility budget: 10k, 30k, 100k a year? Here’s how to actually allocate it AI Visibility Forecasting: predict where your AI visibility will be in 6 months Channel Mix Optimization: How to Rebalance Budget Across AI, SEO and Ads AI Visibility as a Competitive Moat: Why Building It Today Is Worth Double

7.5 Tools 8 deep dives

The prompt framework that turns AI monitoring into comparable data AI Visibility Tracking Tool: I Tested Peec, Otterly and Profound for 3 Months (What Really Changes) Google Search Console already tells you whether you appear in AI Overviews (and almost no one looks) Manual Tests on ChatGPT, Claude and Perplexity: The Invisible Ceiling That Blocks You You are here Mention Mining from AI Answers: Turning Citations into Competitive Intelligence Perplexity Analytics Dashboard: how to measure the AI traffic everyone else ignores Brand24, Mention, Meltwater for AI citation tracking: what actually works today Ask the AI to audit your AI: the self-audit nobody does

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →