AI Platforms

Ranking first on Google but invisible on Perplexity? Check your robots.txt

Roberto Serra 25 June 2026·~7 min read

You've ranked first on Google for years, but on Perplexity you don't exist — and it's not the content's fault. Your site is probably blocking the AI search engine without you knowing: a single line in a technical file that your webmaster configured years ago and no one has touched since. The check takes five minutes, and once you've identified that line, reopening your visibility to the buyers who use Perplexity is immediate.

You.com and Phind are AI-powered answer engines: fewer users than others, but decision-maker users. The CEO who uses Phind isn’t the teenager who uses TikTok. If you’re there, you’re in front of the right eyes.

The same reasoning applies, on a different scale, to Perplexity. It doesn’t have Google’s numbers but it’s the AI engine that analysts, journalists and B2B buyers use to do quick research. And here comes the problem I want to tell you about today: Perplexity doesn’t read the web through Google. It has its own crawler, called PerplexityBot, and if your site blocks it — even by mistake — you’re simply invisible in those answers. Even if you’ve been ranking first on Google for years.

Let me explain it with a case I saw up close in recent weeks.

A distillery in the Val di Noto that didn’t exist for Perplexity

A contact of mine in Syracuse introduced me to a distillery in the Val di Noto: they produce botanical amari and almond liqueurs with family recipes. Well-built site, carefully maintained Google Business profile, first position for queries like “artisanal Sicilian amaro” and “Val di Noto liqueur distillery”.

I ran a quick test. On ChatGPT I asked “artisanal amari from eastern Sicily”: cited. On Perplexity I asked the exact same thing: zero mentions. Neither in the text, nor in the side sources. On You.com, the same. Only on Andi did it appear, but with a generic listing pulled from a directory.

The difference wasn’t “content quality”. It was a single line of code in the robots.txt added by the hosting provider to reduce server load: it blocked all non-Google bots. PerplexityBot included.

What PerplexityBot does and why it concerns you

Perplexity builds its answers by mixing two things: its own crawler’s index (PerplexityBot, which roams the web the way Googlebot did twenty years ago) and real-time calls to web sources while generating the answer. Both of these need to be able to access your pages.

If PerplexityBot doesn’t get through, two concrete consequences occur: your page doesn’t enter the Perplexity index, and when the model looks for real-time sources to build an answer, your domain comes up as unreadable. The system chooses who is readable. Even if it’s less authoritative than you.

This is a different mechanism from Google. On Google, even if you block the crawler, you can still appear in the SERPs through the domain’s authority (the famous “indexed page without a description”). On Perplexity, no: no access, no citation. The logic is stricter precisely because the model has to “read” the content to summarize it, not just link to it.

I’m telling you this by inference: there is no academic publication describing PerplexityBot’s exact behavior. What I’m telling you comes from field observation of dozens of domains over the past few months and from the logic of how a RAG system works. It follows that your action doesn’t start from an official guide, it starts from a test you run yourself on your own site.

Common mistake

You have an Italian shared hosting plan that blocks anything that doesn’t have a “human” user agent string.

The thread that ties it all together: visibility in AI answers

In the previous articles in this series I told you about how AI engines think, about E-E-A-T applied to AI, about recognition of the author as an entity. All that work — the semantic structure, the authority, the entity — is worth zero if the crawler can’t read the page.

Accessibility is the ground floor. It sits beneath everything else. If it’s missing, visibility in AI answers is simply impossible, not difficult.

Pro tip

If you find a block, add `User-agent: PerplexityBot` followed by `Allow: /` as an explicit exception

The test you can run in 5 minutes

Open your robots.txt. Go to `https://yoursite.com/robots.txt` and read it. Look for these three things:

A line `User-agent: PerplexityBot` followed by `Disallow: /` (explicit block)
A line `User-agent: *` followed by `Disallow: /` (generic block that hits everyone)
Exclusion rules on key directories (`/blog/`, `/products/`, `/about-us/`)

If you find one of these three things, you have a problem. To run a clean check you can use TechnicalSEO’s robots.txt tester: you paste in the URL, choose the “PerplexityBot” user agent from the menu, and it tells you whether the page is accessible or blocked.

A binary decision threshold: either it passes, or it doesn’t. There’s no middle ground.

Second step: ask your hosting provider whether it has anti-bot rules at the firewall level (Cloudflare, Sucuri, proprietary solutions). Many Italian hosts for SMBs block “suspicious user agents” aggressively. PerplexityBot is young, and some firewalls treat it as a scraper. An explicit exception needs to be added.

The test I ran myself

I took 18 sites of niche Italian SMBs in food & beverage (wineries, artisanal distilleries, coffee roasters, pasta makers). For each one I did two things: I checked the robots.txt and I ran 3 themed queries on Perplexity, ChatGPT, You.com and Phind.

An indicative result, not a scientific study: 7 sites out of 18 had some form of block that prevented or slowed down PerplexityBot (4 with an explicit disallow, 3 with a hosting firewall that returned a 403 to non-browser user agents). Of those 7, none ever appeared in Perplexity citations. Of the 11 with free access, 6 appeared at least once in cited sources.

Small sample, but the pattern is clear. And I saw the same pattern on You.com and Phind too: niche bots are the first to get blocked by the firewalls of Italian hosts, because they aren’t as “famous” as Googlebot.

The real analysis, the kind you do for a serious client, requires professional server log analysis tools and continuous monitoring of AI citations. What I’m offering you here is a first check step, not an audit.

The mistakes I notice most often

The block inherited from the previous agency. The site was built 4 years ago, the agency had put `Disallow: /` during development and forgot to remove it for the non-Google bots. I find this pattern in at least 1 site out of 5.

The overzealous hosting firewall. You have an Italian shared hosting plan that blocks anything that doesn’t have a “human” user agent string. PerplexityBot honestly answers “PerplexityBot” and gets rejected with a 403. The site is perfect, the AI doesn’t see it.

The noindex on cornerstone pages. Robots.txt fine, but then the homepage or the product pages have a meta robots `noindex`. On Google it creates known problems, on Perplexity it makes the situation worse because some sources are excluded outright.

The aggressive robots scheme after a DDoS attack. You suffered an attack, the host raised the firewalls and never lowered them again. Good survival, AI visibility sacrificed invisibly.

What to do concretely, in order

Download your robots.txt and look for the critical lines (10 minutes)
Ask the host about the active firewall rules on bots (one email, 24-48h wait)
If you find a block, add `User-agent: PerplexityBot` followed by `Allow: /` as an explicit exception
Add `User-agent: ClaudeBot` and `User-agent: GPTBot` while you’re at it: same reasoning
After 2-3 weeks (the time it takes the crawler to come back around), rerun the 3 test queries on Perplexity, You.com, Phind
Compare with the 3-5 competitors the AI cites today in your sector: if they show up and you don’t, the problem is almost always upstream (accessibility or entity, not content)

An honest note: unblocking the crawler doesn’t guarantee you’ll appear. It’s not a magic factor. But it’s the necessary condition. Without it, any other work on authority or presence in the knowledge graph can’t even be evaluated.

Chapter 6 · AI Platforms

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

6.1 Bing Copilot & Others 12 deep dives

Voice AI: how to show up in Alexa, Google Home and Siri answers AI in Social Media (TikTok, Instagram): how your videos become answers inside the apps AI Evolution Monitoring: How to Keep Up With AI Engine Changes Without Losing Your Mind Bing Copilot and the Microsoft ecosystem: why your brand must be there Microsoft Copilot in Office 365: how to land in your buyers’ decks and emails Meta AI on Instagram: the AI engine Pompeii tour operators are ignoring Apple Intelligence and Siri AI: the invisible channel that just landed on every iPhone Vertical AI Chatbots: why being in the niche dataset is worth more than a thousand backlinks AI Search in marketplaces: your product listings are already the source of the AI answers Cross-Platform Consistency: Why Your Brand Must Tell the Same Story on Every AI Platform-specific content strategy: why one piece of content no longer cuts it AI aggregators and meta-search: why being visible only on ChatGPT is no longer enough

6.2 ChatGPT & OpenAI 8 deep dives

ChatGPT: Answer Architecture ChatGPT Browse Mode: Why Live Answers Go Through Bing (and What That Changes for You) GPT Store and Custom GPTs: How to Become the Default Source in Your Industry GPT Store: the plugin ecosystem that recommends brands without you knowing OpenAI Plugins & Actions: when the AI doesn’t recommend you, it uses you ChatGPT’s recipe: where your brand ended up in its training data When ChatGPT Cites You Without Linking: The Referral Pattern Trade-off How ChatGPT cites sources (and why your brand must be in the text, not in a footnote)

6.3 Claude & Anthropic 4 deep dives

Claude, the paranoid editor: how the constitutional filter decides who gets cited Claude and the 200K tokens: why complete guides win where short content disappears Claude doesn’t browse: if you’re not in its training, to it you don’t exist Claude and Artifacts: How to Appear in the Analyses the Model Generates for Your Industry

6.4 Google Gemini & SGE 8 deep dives

Google Quality Rater Guidelines: the manual Google uses for AI answers too Google Merchant Center and AI Shopping: How to Get Your Products Cited by Gemini and SGE Google Vertex AI and Enterprise Search: How to Land in the Answers That Pull from the Web Google SGE and AI Overview: how the architecture really works and what changes for your rankings Gemini and the Knowledge Graph: why Google knows you before it even answers Google AI Overviews and snippet selection: why Gemini picks one brand and ignores the other Google Perspectives & Discussion: when Gemini listens to Reddit before your site Gemini Extensions & Workspace: Why Your Content Inside Drive, Gmail and YouTube Becomes a Direct Channel in AI Answers

6.5 Perplexity 8 deep dives

Perplexity real-time RAG: why your site can enter the answers today, not in six months Perplexity Citation Pattern: How Source Selection Really Works How Perplexity Chooses the Sources It Cites (and Why Your Site Isn’t There) Perplexity Spaces and Collections: the recommendation micro-channel you can own Ranking first on Google but invisible on Perplexity? Check your robots.txt You are here Perplexity Pages: the AI articles Google indexes (and why they matter to you) Perplexity Pro and Free cite different sources: why your client might not see you Perplexity Focus Modes: how not to vanish when the user changes the filter

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →