Does your site have technical security issues — an expired certificate, resources loaded insecurely, missing configurations? Some of the systems that power ChatGPT don't penalize these sources: they exclude them outright, without even reading what you've written. It's not a penalty: it's a door slammed shut before you even step inside. Checking your site's status takes twenty minutes — and it could reopen a visibility channel you're losing in silence.
Your site might have excellent content, an impeccable structure, and authority built over years of work. But if it doesn’t serve its pages over HTTPS, to a RAG system it’s as if it doesn’t exist. Not demoted, not penalized — simply ignored.
In my articles on how AI models work, I’ve analyzed architecture, retrieval, reasoning, and training. Everything I’ve told you so far concerns how the model processes information once it has it. But there’s an even earlier step: the content has to reach those models in the first place. And this is where the technical credibility of your site comes in — a set of infrastructural signals that decide whether AI crawlers read you or skip you.
HTTPS is the first of these signals, and also the most binary: either you have it, or you’re out.
Why security is no longer just “the green padlock”
For years, HTTPS was perceived as a matter of user trust — the padlock in the browser, the reassurance for online shoppers. Google made it a ranking factor in 2014, and since then most sites have fallen in line. So far, nothing new.
But with the arrival of RAG systems and AI agents, the security of the communication channel has taken on a different role. It’s no longer just about protecting user data — it’s about guaranteeing the integrity of the source the model is consulting.
Haoyuan Xu et al. (2026) explain it well in their study on the evolution of AI agents:
“In multi-tool systems, security is no longer determined by the legitimacy of isolated API calls but by the integrity of the entire composed action sequence.”
Translated into the context of visibility: when a RAG system crawls the web to retrieve sources, it doesn’t evaluate only the content of the page — it evaluates the entire security chain through which that content is transmitted. An HTTP site transmits data in plaintext. For a system designed to compose information from multiple sources, an unencrypted source is an integrity risk — and the simplest way to manage that risk is to discard it.
The problem spreads: a weak source contaminates the chain
There’s one aspect many people underestimate. It’s not just your site that gets penalized — it’s the entire system protecting itself.
The same paper by Xu et al. documents a structural phenomenon of multi-tool agents:
“These interactions breach traditional trust boundaries, where localized malicious inputs or model hallucinations can propagate through the system.”
In a system that composes answers from dozens of sources, a compromised input doesn’t stay contained — it spreads. If the crawler retrieves content from an insecure connection and that content was altered in transit (a man-in-the-middle attack), the error enters the generated answer and potentially also influences the interpretation of subsequent sources.
The designers of these systems know this. That’s why pre-retrieval filters have become more aggressive than those of a traditional search engine. Googlebot indexes HTTP pages too — it penalizes them in ranking, but it indexes them. A RAG system operating in real time has less margin: it can’t afford to validate every single source, so it applies binary filters. HTTPS is the first.
I’ve seen sites with certificates that had been expired for weeks without anyone noticing, because human traffic kept arriving (users click “proceed anyway”).
What “scannability” means for AI
If you’re wondering what HTTPS has to do with visibility in AI answers, the connection is more direct than it seems.
Nick Koudas et al. (2025) published an operational guide for those who want to be visible in AI engines. Among the priority recommendations:
“We provide actionable guidance for practitioners, emphasizing the critical need to: (1) engineer content for machine scannability and justification.”
“Machine scannability” isn’t just content structure. It’s the entire chain that allows an automated system to reach, read, and trust your page. HTTPS is the base layer of this chain — without it, the content isn’t securely “scannable,” and a system that has to justify its own sources can’t cite a page whose integrity can’t be verified.
This opens up a broader theme — technical credibility — which is the common thread of the next articles I’ve written to help you understand what your site communicates to AI systems before they even read a single word of your content.
Check HSTS: open the terminal and run `curl -I https://yoursite.com`.
Beyond the certificate: the security signals that matter
HTTPS is the minimum requirement. But “minimum” doesn’t mean “sufficient.” There are additional security signals that AI crawlers — and source-evaluation pipelines — pick up on.
Certificate validity. An expired, self-signed certificate, or one with an incomplete chain, generates a TLS error. A browser shows you a warning and lets you proceed. An automated crawler doesn’t proceed — it closes the connection and moves on to the next source. I’ve seen sites with certificates that had been expired for weeks without anyone noticing, because human traffic kept arriving (users click “proceed anyway”). But bots don’t.
Mixed content. The page is served over HTTPS, but it loads resources (images, scripts, fonts) over HTTP. For a modern browser, the HTTP resources are blocked or degraded. For a crawler evaluating the integrity of the page, mixed content is a signal that the infrastructure isn’t coherent — and infrastructural incoherence correlates with the site’s low overall quality.
HSTS (HTTP Strict Transport Security). This header tells the browser — and the crawler — that the site accepts only HTTPS connections, eliminating even the possibility of a downgrade to HTTP. It’s a signal of technical maturity. A site with HSTS active communicates that security isn’t a patch but an architectural choice.
Redirect chain. If your site does http → https → www → non-www (or any combination), each redirect adds latency and complexity. AI crawlers operate with more aggressive timeouts than Googlebot — a point I explore in depth in the article on page experience for AI. Each redirect is an opportunity for the bot to abandon the crawl.
What to check today
These are surface-level checks — they give you a sense of where you stand, not a complete analysis. But they’re the starting point.
- Check the SSL certificate: open your site and click on the padlock in the browser. Is the certificate valid? Is the chain complete? If you use Let’s Encrypt, verify that automatic renewal is working — a certificate that expires at night can remain invisible for days.
- Look for mixed content: open the browser console (F12 → Console) and load your main pages. Every “Mixed Content” warning is a resource being loaded over HTTP on an HTTPS page. Fix them all.
- Check HSTS: open the terminal and run `curl -I https://yoursite.com`. Look for the `Strict-Transport-Security` header. If it’s not there, your site potentially accepts insecure connections.
- Test the redirect chain: use `curl -vL http://yoursite.com` and count the redirects. If there’s more than one (http → https is acceptable, but http → https → www → another version isn’t), streamline the chain.
If you find problems, the good news is that they’re all fixable in a few hours with a targeted technical intervention. The bad news is that as long as they remain, every AI crawler that visits your site can decide to discard you — and you’ll never know, because you won’t receive an error. You simply won’t appear.
HTTPS in the context of technical credibility
What I’ve described in this article is the first layer of a series of technical signals that AI systems evaluate before they even read your content. It’s not the content that’s in question — it’s the container.
The next deep dives cover the other signals in this chain: loading speed and AI crawler timeouts, crawlability specific to AI bots, the semantic markup that helps AI understand the structure of your content, and the freshness signals that indicate whether your information is still current.
Each of these is a filter. HTTPS is the first — and the most brutal, because it’s binary. The others allow for nuance. This one doesn’t: either your site is secure, or to the RAG system it doesn’t exist.