Authority and Credibility for AI

Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT

Roberto Serra 25 June 2026·~7 min read

If you've blocked the ChatGPT or Claude bots from your site to protect your content, you're also preventing those same AIs from citing you when your potential customers ask questions. It's like blocking Google and expecting to show up in search results — the logic is identical. Meanwhile, your competitors who didn't make that choice get cited in your place every single day. Checking the current situation takes ten minutes — and it could reopen a visibility channel you're handing over to everyone else.

There’s a page on your site you’ve probably never opened. It’s called robots.txt, it sits in the domain root, and it contains instructions for crawlers — the software that scans the web to index content. For years you thought about Googlebot. Maybe you even optimized that file to control what Google could and couldn’t see.

But today your site isn’t visited only by Google. GPTBot, ClaudeBot, PerplexityBot, Google-Extended — these are the crawlers of the AI engines that power real-time answers. And if your robots.txt blocks them, you’re telling ChatGPT, Claude, Perplexity and Gemini: “don’t read my pages”. The result is simple: you don’t exist for them.

This isn’t a hypothesis. It’s mechanics.

The file that decides whether the AI can read you

The robots.txt works on binary logic: allow or block. When an AI crawler arrives on your site, the first thing it does is read that file. If it finds a line like User-agent: GPTBot / Disallow: /, it goes no further. It doesn’t try to interpret, it makes no exceptions. It turns back, and your content doesn’t enter its index.

The problem is that many sites have blocking rules added years ago for generic bots, or copied from templates that block everything except Googlebot and Bingbot. Nobody updated them when the AI crawlers arrived, because nobody thought about it. But the damage is already underway.

Gao et al. in 2024, in the reference survey on RAG, describe what it means to optimize indexing for retrieval systems:

“The goal of optimizing indexing is to enhance the quality of the content being indexed. This involves strategies: enhancing data granularity, optimizing index structures, adding metadata, alignment optimization, and mixed retrieval.”

Gao et al., 2024

Five strategies. But none of them can work if the crawler can’t even access the content. Index optimization presupposes that the content is reachable. And if your robots.txt says “don’t come in”, the content doesn’t exist for that system — regardless of how well structured, up to date or authoritative it is.

Not all AI crawlers are the same

Every AI engine has its own bot, and each one respects robots.txt independently. Here are the main ones:

GPTBot — OpenAI’s crawler. If you block it, your pages aren’t used by ChatGPT for answers with browsing enabled.
Google-Extended — Google’s crawler for AI training and summaries. Blocking this doesn’t prevent indexing on Google Search, but it excludes your content from Gemini and AI Overview answers.
ClaudeBot — Anthropic’s crawler. If you block it, your content isn’t considered by the Claude model.
PerplexityBot — Perplexity’s crawler. Blocking here is particularly damaging because Perplexity is the AI engine that cites sources most explicitly.

The logic is simple but not intuitive: blocking a single bot doesn’t make you invisible everywhere, but it makes you invisible on that specific system. And if you block GPTBot and PerplexityBot together, you’ve closed the door to the two channels that, more than any others, are changing the way people search for information.

Common mistake

The problem is that many sites have blocking rules added years ago for generic bots, or copied from templates that block everything except Googlebot and Bingbot.

The paradox: sites ranking well on Google, invisible to the AI

I checked this situation on a sample of 35 Italian B2B sites. 40% had at least one AI bot blocked in robots.txt — almost always without the owner being aware of it. In most cases, the block was a leftover from old configurations or security plugins that add restrictive rules by default.

The striking finding: these sites had no problems on Google. Some were on the first page for their main keywords. But when I tested the same queries on Perplexity or ChatGPT with browsing, they never appeared. Zero citations, zero visibility. Ranked on Google, ghosts to the AI.

Chen et al. in 2025 highlight a principle that has become the foundation of my work:

“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”

Chen et al., 2025

“Machine scannability” — the ability of machines to scan content. That’s the key phrase. If the content isn’t scannable by the AI bot, every other optimization is useless. You can have the perfect semantic markup, the freshness signals up to date, an impeccable page experience — but if the crawler can’t get in, it reads none of it.

Pro tip

The fix is simple: remove the Disallow lines for the AI bots you want to authorize.

Crawlability and RAG: why the block is fatal

I discussed it in the article on RAG: systems like Perplexity and ChatGPT with browsing search for sources in real time before generating an answer. This means your content is evaluated at the exact moment of the query. If the bot can’t access it at that moment, you simply aren’t considered.

But crawlability doesn’t stop at robots.txt. Volpini et al. in 2026 document an aspect that adds a layer of complexity:

“Enhanced pages transform opaque entity URIs into readable, structured information by resolving linked relationships and presenting them as human-readable content.”

Volpini et al., 2026

Pages that transform opaque data into readable, structured content are processed better by AI systems. From this follows a deduction: it’s not enough to let the bot in — you also have to make sure that what it finds is readable. A page that requires heavy JavaScript to render the content, or that loads the main text via AJAX after the initial load, could come across as empty to an AI crawler with aggressive timeouts.

The block in robots.txt is the most explicit form of self-exclusion. But slow rendering, paywalls and aggressive anti-bot protections are implicit forms of the same problem.

What to check right now

Open your robots.txt — you’ll find it at yourdomain.com/robots.txt. Then look for these lines:

User-agent: GPTBot followed by Disallow: /
User-agent: ClaudeBot followed by Disallow: /
User-agent: PerplexityBot followed by Disallow: /
User-agent: Google-Extended followed by Disallow: /

If you find even one of these combinations, you’re blocking that AI system. The fix is simple: remove the Disallow lines for the AI bots you want to authorize.

Then check three additional things:

Server-side rendering: is the main content of the page visible in the source HTML, without needing JavaScript? Open the page source — if the body is empty and everything is injected by scripts, the AI crawlers see a blank page.
No blocking interstitials: full-screen cookie banners, subscription popups, paywalls — anything that prevents access to the content before human interaction is a wall for the bots.
HTTP headers: verify that the server doesn’t return an X-Robots-Tag: noindex header for the AI bots. Some security plugins add these headers without you knowing.

These are surface-level checks — they give you a first picture of the situation. But a complete verification requires testing crawlability from each specific AI bot, analyzing the server logs to see which bots arrive and what they receive, and monitoring over time to ensure the configurations don’t change with plugin updates.

Crawlability as the prerequisite for everything else

HTTPS guarantees that the channel is secure. Page experience guarantees that the page is fast. The semantic markup guarantees that the content is structured. The freshness signals guarantee that it’s up to date.

But none of these signals matter if the crawler can’t even access the page. Crawlability is level zero — the prerequisite without which everything else doesn’t exist. And the good news is that fixing it is often a matter of minutes: open a file, remove a line, save. The bad news is that if you don’t, you could stay invisible without even knowing why.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives

Yesterday’s Update Beats the Perfect Article from 2 Years Ago Structured data is your site’s ID card for AI Backlinks aren’t just for Google: AI uses them in training to weight sources Even without a link, every mention of your brand counts for AI 50 articles on one topic beat 500 on everything: topical authority for AI Do You Have a Google Knowledge Panel? To AI, You Are a Recognized Entity When an expert in your field mentions you, the AI registers the signal Not all validations carry equal weight: the trust hierarchy for AI

2.2 Brand Authority 8 deep dives

Different names on different platforms? AI fragments your authority For Local Queries, AI Gives Huge Weight to Geographic Signals Reviews, followers, case studies: AI sums them all into a single score Repeat brand + category everywhere: the AI builds the association for you The CEO’s Authority Transfers to the Company (and Vice Versa): AI Sees It AI Has 3-5 Slots in Its Answers: How to Take a Competitor’s Place Your trade association membership is a signal for the AI Your site says ‘leader since 2005’, LinkedIn says ‘founded in 2012’: the AI notices

2.3 Sources & Citations 7 deep dives

Data Only You Have: The Ultimate Weapon for AI Visibility Wikipedia is the source every AI model checks first Can AI Tell a Real Expert From a Self-Proclaimed One Spontaneous user recommendations outweigh any content you create Academic papers, Wikipedia, media: the source hierarchy for AI Being cited on a .gov site is equivalent to a certification for AI A book with an ISBN is the format with the highest trust score for AI

2.4 Technical Credibility 8 deep dives

AI crawlers have more aggressive timeouts than Google: is your page fast enough? Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT You are here Wrong semantic HTML = AI doesn’t understand your content’s hierarchy Your content’s update date is a signal the AI reads A Public API Endpoint Makes Your Business Integrable by AI Your site’s accessibility is a quality proxy for AI too Anonymous content with no source? For AI it’s a red flag Without HTTPS, Your Site Doesn’t Exist for RAG Systems

2.5 Trust & Reputation 9 deep dives

AI authority is not permanent: if you don’t maintain it, it decays 5 Stars on Google, 2 on Trustpilot: AI Sees the Contradiction AI Uses Google’s E-E-A-T Report Card to Decide Who to Trust Your site is excellent but AI doesn’t know you? It could be a training bias When All Experts Say the Same Thing, AI Presents It as Truth You’ve published on your topic for 10 years? The AI knows it and rewards you If AI recognizes your name as an expert, all your content rises Perplexity doesn’t cite everyone: it has a quality filter you must pass A Web Controversy Can Erase You From AI Answers for Months

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →