Authority and Credibility for AI

Without HTTPS, Your Site Doesn’t Exist for RAG Systems

Roberto Serra 25 June 2026·~7 min read

Does your site have technical security issues — an expired certificate, resources loaded insecurely, missing configurations? Some of the systems that power ChatGPT don't penalize these sources: they exclude them outright, without even reading what you've written. It's not a penalty: it's a door slammed shut before you even step inside. Checking your site's status takes twenty minutes — and it could reopen a visibility channel you're losing in silence.

Your site might have excellent content, an impeccable structure, and authority built over years of work. But if it doesn’t serve its pages over HTTPS, to a RAG system it’s as if it doesn’t exist. Not demoted, not penalized — simply ignored.

In my articles on how AI models work, I’ve analyzed architecture, retrieval, reasoning, and training. Everything I’ve told you so far concerns how the model processes information once it has it. But there’s an even earlier step: the content has to reach those models in the first place. And this is where the technical credibility of your site comes in — a set of infrastructural signals that decide whether AI crawlers read you or skip you.

HTTPS is the first of these signals, and also the most binary: either you have it, or you’re out.

Why security is no longer just “the green padlock”

For years, HTTPS was perceived as a matter of user trust — the padlock in the browser, the reassurance for online shoppers. Google made it a ranking factor in 2014, and since then most sites have fallen in line. So far, nothing new.

But with the arrival of RAG systems and AI agents, the security of the communication channel has taken on a different role. It’s no longer just about protecting user data — it’s about guaranteeing the integrity of the source the model is consulting.

Haoyuan Xu et al. (2026) explain it well in their study on the evolution of AI agents:

“In multi-tool systems, security is no longer determined by the legitimacy of isolated API calls but by the integrity of the entire composed action sequence.”

Haoyuan Xu et al., 2026

Translated into the context of visibility: when a RAG system crawls the web to retrieve sources, it doesn’t evaluate only the content of the page — it evaluates the entire security chain through which that content is transmitted. An HTTP site transmits data in plaintext. For a system designed to compose information from multiple sources, an unencrypted source is an integrity risk — and the simplest way to manage that risk is to discard it.

The problem spreads: a weak source contaminates the chain

There’s one aspect many people underestimate. It’s not just your site that gets penalized — it’s the entire system protecting itself.

The same paper by Xu et al. documents a structural phenomenon of multi-tool agents:

“These interactions breach traditional trust boundaries, where localized malicious inputs or model hallucinations can propagate through the system.”

Haoyuan Xu et al., 2026

In a system that composes answers from dozens of sources, a compromised input doesn’t stay contained — it spreads. If the crawler retrieves content from an insecure connection and that content was altered in transit (a man-in-the-middle attack), the error enters the generated answer and potentially also influences the interpretation of subsequent sources.

The designers of these systems know this. That’s why pre-retrieval filters have become more aggressive than those of a traditional search engine. Googlebot indexes HTTP pages too — it penalizes them in ranking, but it indexes them. A RAG system operating in real time has less margin: it can’t afford to validate every single source, so it applies binary filters. HTTPS is the first.

Common mistake

I’ve seen sites with certificates that had been expired for weeks without anyone noticing, because human traffic kept arriving (users click “proceed anyway”).

What “scannability” means for AI

If you’re wondering what HTTPS has to do with visibility in AI answers, the connection is more direct than it seems.

Nick Koudas et al. (2025) published an operational guide for those who want to be visible in AI engines. Among the priority recommendations:

“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”

Nick Koudas et al., 2025

“Machine scannability” isn’t just content structure. It’s the entire chain that allows an automated system to reach, read, and trust your page. HTTPS is the base layer of this chain — without it, the content isn’t securely “scannable,” and a system that has to justify its own sources can’t cite a page whose integrity can’t be verified.

This opens up a broader theme — technical credibility — which is the common thread of the next articles I’ve written to help you understand what your site communicates to AI systems before they even read a single word of your content.

Pro tip

Check HSTS: open the terminal and run `curl -I https://yoursite.com`.

Beyond the certificate: the security signals that matter

HTTPS is the minimum requirement. But “minimum” doesn’t mean “sufficient.” There are additional security signals that AI crawlers — and source-evaluation pipelines — pick up on.

Certificate validity. An expired, self-signed certificate, or one with an incomplete chain, generates a TLS error. A browser shows you a warning and lets you proceed. An automated crawler doesn’t proceed — it closes the connection and moves on to the next source. I’ve seen sites with certificates that had been expired for weeks without anyone noticing, because human traffic kept arriving (users click “proceed anyway”). But bots don’t.

Mixed content. The page is served over HTTPS, but it loads resources (images, scripts, fonts) over HTTP. For a modern browser, the HTTP resources are blocked or degraded. For a crawler evaluating the integrity of the page, mixed content is a signal that the infrastructure isn’t coherent — and infrastructural incoherence correlates with the site’s low overall quality.

HSTS (HTTP Strict Transport Security). This header tells the browser — and the crawler — that the site accepts only HTTPS connections, eliminating even the possibility of a downgrade to HTTP. It’s a signal of technical maturity. A site with HSTS active communicates that security isn’t a patch but an architectural choice.

Redirect chain. If your site does http → https → www → non-www (or any combination), each redirect adds latency and complexity. AI crawlers operate with more aggressive timeouts than Googlebot — a point I explore in depth in the article on page experience for AI. Each redirect is an opportunity for the bot to abandon the crawl.

What to check today

These are surface-level checks — they give you a sense of where you stand, not a complete analysis. But they’re the starting point.

Check the SSL certificate: open your site and click on the padlock in the browser. Is the certificate valid? Is the chain complete? If you use Let’s Encrypt, verify that automatic renewal is working — a certificate that expires at night can remain invisible for days.
Look for mixed content: open the browser console (F12 → Console) and load your main pages. Every “Mixed Content” warning is a resource being loaded over HTTP on an HTTPS page. Fix them all.
Check HSTS: open the terminal and run `curl -I https://yoursite.com`. Look for the `Strict-Transport-Security` header. If it’s not there, your site potentially accepts insecure connections.
Test the redirect chain: use `curl -vL http://yoursite.com` and count the redirects. If there’s more than one (http → https is acceptable, but http → https → www → another version isn’t), streamline the chain.

If you find problems, the good news is that they’re all fixable in a few hours with a targeted technical intervention. The bad news is that as long as they remain, every AI crawler that visits your site can decide to discard you — and you’ll never know, because you won’t receive an error. You simply won’t appear.

HTTPS in the context of technical credibility

What I’ve described in this article is the first layer of a series of technical signals that AI systems evaluate before they even read your content. It’s not the content that’s in question — it’s the container.

The next deep dives cover the other signals in this chain: loading speed and AI crawler timeouts, crawlability specific to AI bots, the semantic markup that helps AI understand the structure of your content, and the freshness signals that indicate whether your information is still current.

Each of these is a filter. HTTPS is the first — and the most brutal, because it’s binary. The others allow for nuance. This one doesn’t: either your site is secure, or to the RAG system it doesn’t exist.

Chapter 2 · Authority and Credibility for AI

Continue with the deep dives

40 deep dives across the 5 sections of the chapter.

2.1 Authority Signals 8 deep dives

Yesterday’s Update Beats the Perfect Article from 2 Years Ago Structured data is your site’s ID card for AI Backlinks aren’t just for Google: AI uses them in training to weight sources Even without a link, every mention of your brand counts for AI 50 articles on one topic beat 500 on everything: topical authority for AI Do You Have a Google Knowledge Panel? To AI, You Are a Recognized Entity When an expert in your field mentions you, the AI registers the signal Not all validations carry equal weight: the trust hierarchy for AI

2.2 Brand Authority 8 deep dives

Different names on different platforms? AI fragments your authority For Local Queries, AI Gives Huge Weight to Geographic Signals Reviews, followers, case studies: AI sums them all into a single score Repeat brand + category everywhere: the AI builds the association for you The CEO’s Authority Transfers to the Company (and Vice Versa): AI Sees It AI Has 3-5 Slots in Its Answers: How to Take a Competitor’s Place Your trade association membership is a signal for the AI Your site says ‘leader since 2005’, LinkedIn says ‘founded in 2012’: the AI notices

2.3 Sources & Citations 7 deep dives

Data Only You Have: The Ultimate Weapon for AI Visibility Wikipedia is the source every AI model checks first Can AI Tell a Real Expert From a Self-Proclaimed One Spontaneous user recommendations outweigh any content you create Academic papers, Wikipedia, media: the source hierarchy for AI Being cited on a .gov site is equivalent to a certification for AI A book with an ISBN is the format with the highest trust score for AI

2.4 Technical Credibility 8 deep dives

AI crawlers have more aggressive timeouts than Google: is your page fast enough? Are You Blocking GPTBot in robots.txt? Then You’re Invisible to ChatGPT Wrong semantic HTML = AI doesn’t understand your content’s hierarchy Your content’s update date is a signal the AI reads A Public API Endpoint Makes Your Business Integrable by AI Your site’s accessibility is a quality proxy for AI too Anonymous content with no source? For AI it’s a red flag Without HTTPS, Your Site Doesn’t Exist for RAG Systems You are here

2.5 Trust & Reputation 9 deep dives

AI authority is not permanent: if you don’t maintain it, it decays 5 Stars on Google, 2 on Trustpilot: AI Sees the Contradiction AI Uses Google’s E-E-A-T Report Card to Decide Who to Trust Your site is excellent but AI doesn’t know you? It could be a training bias When All Experts Say the Same Thing, AI Presents It as Truth You’ve published on your topic for 10 years? The AI knows it and rewards you If AI recognizes your name as an expert, all your content rises Perplexity doesn’t cite everyone: it has a quality filter you must pass A Web Controversy Can Erase You From AI Answers for Months

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →