You've ranked first on Google for years, but on Perplexity you don't exist — and it's not the content's fault. Your site is probably blocking the AI search engine without you knowing: a single line in a technical file that your webmaster configured years ago and no one has touched since. The check takes five minutes, and once you've identified that line, reopening your visibility to the buyers who use Perplexity is immediate.
You.com and Phind are AI-powered answer engines: fewer users than others, but decision-maker users. The CEO who uses Phind isn’t the teenager who uses TikTok. If you’re there, you’re in front of the right eyes.
The same reasoning applies, on a different scale, to Perplexity. It doesn’t have Google’s numbers but it’s the AI engine that analysts, journalists and B2B buyers use to do quick research. And here comes the problem I want to tell you about today: Perplexity doesn’t read the web through Google. It has its own crawler, called PerplexityBot, and if your site blocks it — even by mistake — you’re simply invisible in those answers. Even if you’ve been ranking first on Google for years.
Let me explain it with a case I saw up close in recent weeks.
A distillery in the Val di Noto that didn’t exist for Perplexity
A contact of mine in Syracuse introduced me to a distillery in the Val di Noto: they produce botanical amari and almond liqueurs with family recipes. Well-built site, carefully maintained Google Business profile, first position for queries like “artisanal Sicilian amaro” and “Val di Noto liqueur distillery”.
I ran a quick test. On ChatGPT I asked “artisanal amari from eastern Sicily”: cited. On Perplexity I asked the exact same thing: zero mentions. Neither in the text, nor in the side sources. On You.com, the same. Only on Andi did it appear, but with a generic listing pulled from a directory.
The difference wasn’t “content quality”. It was a single line of code in the robots.txt added by the hosting provider to reduce server load: it blocked all non-Google bots. PerplexityBot included.
What PerplexityBot does and why it concerns you
Perplexity builds its answers by mixing two things: its own crawler’s index (PerplexityBot, which roams the web the way Googlebot did twenty years ago) and real-time calls to web sources while generating the answer. Both of these need to be able to access your pages.
If PerplexityBot doesn’t get through, two concrete consequences occur: your page doesn’t enter the Perplexity index, and when the model looks for real-time sources to build an answer, your domain comes up as unreadable. The system chooses who is readable. Even if it’s less authoritative than you.
This is a different mechanism from Google. On Google, even if you block the crawler, you can still appear in the SERPs through the domain’s authority (the famous “indexed page without a description”). On Perplexity, no: no access, no citation. The logic is stricter precisely because the model has to “read” the content to summarize it, not just link to it.
I’m telling you this by inference: there is no academic publication describing PerplexityBot’s exact behavior. What I’m telling you comes from field observation of dozens of domains over the past few months and from the logic of how a RAG system works. It follows that your action doesn’t start from an official guide, it starts from a test you run yourself on your own site.
You have an Italian shared hosting plan that blocks anything that doesn’t have a “human” user agent string.
The thread that ties it all together: visibility in AI answers
In the previous articles in this series I told you about how AI engines think, about E-E-A-T applied to AI, about recognition of the author as an entity. All that work — the semantic structure, the authority, the entity — is worth zero if the crawler can’t read the page.
Accessibility is the ground floor. It sits beneath everything else. If it’s missing, visibility in AI answers is simply impossible, not difficult.
If you find a block, add `User-agent: PerplexityBot` followed by `Allow: /` as an explicit exception
The test you can run in 5 minutes
Open your robots.txt. Go to `https://yoursite.com/robots.txt` and read it. Look for these three things:
- A line `User-agent: PerplexityBot` followed by `Disallow: /` (explicit block)
- A line `User-agent: *` followed by `Disallow: /` (generic block that hits everyone)
- Exclusion rules on key directories (`/blog/`, `/products/`, `/about-us/`)
If you find one of these three things, you have a problem. To run a clean check you can use TechnicalSEO’s robots.txt tester: you paste in the URL, choose the “PerplexityBot” user agent from the menu, and it tells you whether the page is accessible or blocked.
A binary decision threshold: either it passes, or it doesn’t. There’s no middle ground.
Second step: ask your hosting provider whether it has anti-bot rules at the firewall level (Cloudflare, Sucuri, proprietary solutions). Many Italian hosts for SMBs block “suspicious user agents” aggressively. PerplexityBot is young, and some firewalls treat it as a scraper. An explicit exception needs to be added.
The test I ran myself
I took 18 sites of niche Italian SMBs in food & beverage (wineries, artisanal distilleries, coffee roasters, pasta makers). For each one I did two things: I checked the robots.txt and I ran 3 themed queries on Perplexity, ChatGPT, You.com and Phind.
An indicative result, not a scientific study: 7 sites out of 18 had some form of block that prevented or slowed down PerplexityBot (4 with an explicit disallow, 3 with a hosting firewall that returned a 403 to non-browser user agents). Of those 7, none ever appeared in Perplexity citations. Of the 11 with free access, 6 appeared at least once in cited sources.
Small sample, but the pattern is clear. And I saw the same pattern on You.com and Phind too: niche bots are the first to get blocked by the firewalls of Italian hosts, because they aren’t as “famous” as Googlebot.
The real analysis, the kind you do for a serious client, requires professional server log analysis tools and continuous monitoring of AI citations. What I’m offering you here is a first check step, not an audit.
The mistakes I notice most often
The block inherited from the previous agency. The site was built 4 years ago, the agency had put `Disallow: /` during development and forgot to remove it for the non-Google bots. I find this pattern in at least 1 site out of 5.
The overzealous hosting firewall. You have an Italian shared hosting plan that blocks anything that doesn’t have a “human” user agent string. PerplexityBot honestly answers “PerplexityBot” and gets rejected with a 403. The site is perfect, the AI doesn’t see it.
The noindex on cornerstone pages. Robots.txt fine, but then the homepage or the product pages have a meta robots `noindex`. On Google it creates known problems, on Perplexity it makes the situation worse because some sources are excluded outright.
The aggressive robots scheme after a DDoS attack. You suffered an attack, the host raised the firewalls and never lowered them again. Good survival, AI visibility sacrificed invisibly.
What to do concretely, in order
- Download your robots.txt and look for the critical lines (10 minutes)
- Ask the host about the active firewall rules on bots (one email, 24-48h wait)
- If you find a block, add `User-agent: PerplexityBot` followed by `Allow: /` as an explicit exception
- Add `User-agent: ClaudeBot` and `User-agent: GPTBot` while you’re at it: same reasoning
- After 2-3 weeks (the time it takes the crawler to come back around), rerun the 3 test queries on Perplexity, You.com, Phind
- Compare with the 3-5 competitors the AI cites today in your sector: if they show up and you don’t, the problem is almost always upstream (accessibility or entity, not content)
An honest note: unblocking the crawler doesn’t guarantee you’ll appear. It’s not a magic factor. But it’s the necessary condition. Without it, any other work on authority or presence in the knowledge graph can’t even be evaluated.