Authority and Credibility for AI

Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT

If you've blocked the ChatGPT or Claude bots from your site to protect your content, you're also preventing those same AIs from citing you when your potential customers ask questions. It's like blocking Google and expecting to show up in search results — the logic is identical. Meanwhile, your competitors who didn't make that choice get cited in your place every single day. Checking the current situation takes ten minutes — and it could reopen a visibility channel you're handing over to everyone else.

There’s a page on your site you’ve probably never opened. It’s called robots.txt, it sits in the domain root, and it contains instructions for crawlers — the software that scans the web to index content. For years you thought about Googlebot. Maybe you even optimized that file to control what Google could and couldn’t see.

But today your site isn’t visited only by Google. GPTBot, ClaudeBot, PerplexityBot, Google-Extended — these are the crawlers of the AI engines that power real-time answers. And if your robots.txt blocks them, you’re telling ChatGPT, Claude, Perplexity and Gemini: “don’t read my pages”. The result is simple: you don’t exist for them.

This isn’t a hypothesis. It’s mechanics.

The file that decides whether the AI can read you

The robots.txt works on binary logic: allow or block. When an AI crawler arrives on your site, the first thing it does is read that file. If it finds a line like User-agent: GPTBot / Disallow: /, it goes no further. It doesn’t try to interpret, it makes no exceptions. It turns back, and your content doesn’t enter its index.

The problem is that many sites have blocking rules added years ago for generic bots, or copied from templates that block everything except Googlebot and Bingbot. Nobody updated them when the AI crawlers arrived, because nobody thought about it. But the damage is already underway.

Gao et al. in 2024, in the reference survey on RAG, describe what it means to optimize indexing for retrieval systems:

“The goal of optimizing indexing is to enhance the quality of the content being indexed. This involves strategies: enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and mixed retrieval.”

Gao et al., 2024

Five strategies. But none of them can work if the crawler can’t even access the content. Index optimization presupposes that the content is reachable. And if your robots.txt says “don’t come in”, the content doesn’t exist for that system — regardless of how well structured, up to date or authoritative it is.

Not all AI crawlers are the same

Every AI engine has its own bot, and each one respects robots.txt independently. Here are the main ones:

  • GPTBot — OpenAI’s crawler. If you block it, your pages aren’t used by ChatGPT for answers with browsing enabled.
  • Google-Extended — Google’s crawler for AI training and summaries. Blocking this doesn’t prevent indexing on Google Search, but it excludes your content from Gemini and AI Overview answers.
  • ClaudeBot — Anthropic’s crawler. If you block it, your content isn’t considered by the Claude model.
  • PerplexityBot — Perplexity’s crawler. Blocking here is particularly damaging because Perplexity is the AI engine that cites sources most explicitly.

The logic is simple but not intuitive: blocking a single bot doesn’t make you invisible everywhere, but it makes you invisible on that specific system. And if you block GPTBot and PerplexityBot together, you’ve closed the door to the two channels that, more than any others, are changing the way people search for information.

Common mistake

The problem is that many sites have blocking rules added years ago for generic bots, or copied from templates that block everything except Googlebot and Bingbot.

The paradox: sites ranking well on Google, invisible to the AI

I checked this situation on a sample of 35 Italian B2B sites. 40% had at least one AI bot blocked in robots.txt — almost always without the owner being aware of it. In most cases, the block was a leftover from old configurations or security plugins that add restrictive rules by default.

The striking finding: these sites had no problems on Google. Some were on the first page for their main keywords. But when I tested the same queries on Perplexity or ChatGPT with browsing, they never appeared. Zero citations, zero visibility. Ranked on Google, ghosts to the AI.

Chen et al. in 2025 highlight a principle that has become the foundation of my work:

“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”

Chen et al., 2025

“Machine scannability” — the ability of machines to scan content. That’s the key phrase. If the content isn’t scannable by the AI bot, every other optimization is useless. You can have the perfect semantic markup, the freshness signals up to date, an impeccable page experience — but if the crawler can’t get in, it reads none of it.

Pro tip

The fix is simple: remove the Disallow lines for the AI bots you want to authorize.

Crawlability and RAG: why the block is fatal

I discussed it in the article on RAG: systems like Perplexity and ChatGPT with browsing search for sources in real time before generating an answer. This means your content is evaluated at the exact moment of the query. If the bot can’t access it at that moment, you simply aren’t considered.

But crawlability doesn’t stop at robots.txt. Volpini et al. in 2026 document an aspect that adds a layer of complexity:

“Enhanced pages transform opaque entity URIs into readable, structured information by resolving linked relationships and presenting them as human-readable content.”

Volpini et al., 2026

Pages that transform opaque data into readable, structured content are processed better by AI systems. From this follows a deduction: it’s not enough to let the bot in — you also have to make sure that what it finds is readable. A page that requires heavy JavaScript to render the content, or that loads the main text via AJAX after the initial load, could come across as empty to an AI crawler with aggressive timeouts.

The block in robots.txt is the most explicit form of self-exclusion. But slow rendering, paywalls and aggressive anti-bot protections are implicit forms of the same problem.

What to check right now

Open your robots.txt — you’ll find it at yourdomain.com/robots.txt. Then look for these lines:

  • User-agent: GPTBot followed by Disallow: /
  • User-agent: ClaudeBot followed by Disallow: /
  • User-agent: PerplexityBot followed by Disallow: /
  • User-agent: Google-Extended followed by Disallow: /

If you find even one of these combinations, you’re blocking that AI system. The fix is simple: remove the Disallow lines for the AI bots you want to authorize.

Then check three additional things:

  1. Server-side rendering: is the main content of the page visible in the source HTML, without needing JavaScript? Open the page source — if the body is empty and everything is injected by scripts, the AI crawlers see a blank page.
  2. No blocking interstitials: full-screen cookie banners, subscription popups, paywalls — anything that prevents access to the content before human interaction is a wall for the bots.
  3. HTTP headers: verify that the server doesn’t return an X-Robots-Tag: noindex header for the AI bots. Some security plugins add these headers without you knowing.

These are surface-level checks — they give you a first picture of the situation. But a complete verification requires testing crawlability from each specific AI bot, analyzing the server logs to see which bots arrive and what they receive, and monitoring over time to ensure the configurations don’t change with plugin updates.

Crawlability as the prerequisite for everything else

HTTPS guarantees that the channel is secure. Page experience guarantees that the page is fast. The semantic markup guarantees that the content is structured. The freshness signals guarantee that it’s up to date.

But none of these signals matter if the crawler can’t even access the page. Crawlability is level zero — the prerequisite without which everything else doesn’t exist. And the good news is that fixing it is often a matter of minutes: open a file, remove a line, save. The bad news is that if you don’t, you could stay invisible without even knowing why.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives
2.2 Brand Authority 8 deep dives
2.3 Sources & Citations 7 deep dives
2.4 Technical Credibility 8 deep dives
2.5 Trust & Reputation 9 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand