How AI engines think

The semantic distance between you and your customer decides whether AI finds you

Roberto Serra 25 June 2026·~6 min read

Your website talks about integrated solutions for digital transformation, but your customer asks the AI how to sell more online. To the AI, these two phrases are worlds apart — and when it has to pick a source, it picks whoever uses the same words as the customer, not whoever uses industry jargon. You're losing visibility because of a vocabulary problem, not a competence problem. Measuring and closing this gap between how you speak and how your customers speak is precise work — and it radically shifts how often you get found.

You’ve probably already done it. You opened ChatGPT, typed “best companies for [whatever you do] in [your city],” and waited for the answer. Your name wasn’t there. You tried different variations — the name of the service, the industry, the area. Nothing. Then you tried Perplexity, then Gemini. The names that came up were always those of your competitors.

At that point you asked yourself the right question: why them and not me?

One of the answers — not the only one, but one that few people consider — is that your website and your customers’ questions speak two different languages. And for the AI, this difference in language translates into a measurable distance that determines whether you get found or skipped.

How AI measures the “closeness” between texts

AI models don’t compare words. They compare numbers.

Every piece of text — a sentence, a paragraph, an entire page — gets converted into an embedding: a vector of hundreds of numbers that represents the meaning of the text in a multidimensional space. Zhang et al. (2024) explain it well in their paper on hybrid search:

“Typically, dense embedding vectors are normalized to unit magnitude, with IP distances ranging from zero to one.”
(Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors)

Zero means the two texts are identical in meaning. One means they have nothing in common. All of AI retrieval — the mechanism by which Perplexity, Bing Chat, and Google AI Overview decide which sources to use to answer — comes down to this distance.

The fascinating part is that texts with similar meaning end up close together even if they use completely different words. “How to increase online sales” and “strategies to grow your ecommerce” are different sequences of characters, but in vector space they almost overlap.

As the survey by Minaee et al. (2025) documents:

“The embedding vectors learned by NLMs define a hidden space where the semantic similarity between vectors can be readily computed as their distance.”
(Large Language Models: A Survey)

“Readily computed” — easily calculated. For the AI, measuring how close your content is to the customer’s question is a trivial operation. It does it in milliseconds across millions of documents. And the winner — the source that gets retrieved and cited in the answer — is almost always the one with the lowest distance.

The gap nobody measures

This is the part that directly affects your business, and in my experience it’s one of the most widespread and least recognized problems.

Your website says: “IoT platform for predictive condition monitoring of industrial assets with hybrid edge-cloud architecture.”

The plant manager who would be your perfect customer asks Perplexity: “how do I know when a machine is about to break down?”

Same thing. But the vector distance between those two texts is enormous. Your website speaks the language of the engineer who wrote it. The customer speaks the language of someone with a problem to solve. The AI measures the distance, finds sources closer to the query, and cites those.

I tested this pattern on 25 Italian B2B websites, comparing the language of their service pages with the real queries from Google Search Console. In 20 cases out of 25, there was a significant misalignment: the website used internal terminology or technical anglicisms, while customers searched with colloquial, problem-oriented Italian phrases.

The interesting thing is that the misalignment wasn’t uniform. The “About us” pages and blogs tended to use more natural language — but it was the service pages, the ones that should convert, that had the highest vector distance from the customers’ queries.

Common mistake

If your page says “we offer MDR, SOC, and VAPT services” without ever explaining what they mean, you’re talking to people who already know — but people who already know probably aren’t searching for you on Perplexity.

Two languages, one solution

You don’t have to choose between speaking like an expert and speaking like your customer. You can do both, and in fact you should.

The strategy is pairing: keep the technical jargon for those who search for it (and for your credibility), but always pair it with a phrasing in the customer’s language.

“Our next-generation SIEM platform” becomes “Our next-generation SIEM platform — in practice, the system that protects your company from cyberattacks by monitoring everything that happens on the network, 24 hours a day.”

This way you create two vectors close to two different types of queries: those searching for “enterprise SIEM” and those searching for “how to protect my company from hackers.” With a single paragraph you’re covering two areas of vector space instead of one.

Pro tip

The strategy is pairing: keep the technical jargon for those who search for it (and for your credibility), but always pair it with a phrasing in the customer’s language.

The specific problem with Italian

One thing I often see in Italian companies is literal translation from English. “Digital Transformation Strategy” becomes “Strategia di Trasformazione Digitale” — which sounds good in a corporate document but isn’t how a business owner in Brescia searches on Perplexity. He searches for “how to digitize my company” or “software to manage the company better.”

The literal translation of technical terms creates a vector distance from the way the Italian market actually speaks. And in a context where retrieval is based entirely on semantic closeness, you pay for that distance.

Another pattern I’ve noticed: unexplained acronyms. If your page says “we offer MDR, SOC, and VAPT services” without ever explaining what they mean, you’re talking to people who already know — but people who already know probably aren’t searching for you on Perplexity. The people searching for you are those who have a problem and don’t know the jargon. And their vector is worlds away from yours.

How to measure your gap

The most useful test doesn’t require sophisticated tools:

Open Google Search Console, go to Performance → Queries, and look at the phrases people use to find your site. Then open the corresponding landing pages. How similar is the language of the queries to the language of the pages? If the queries say “how to do X” and your pages say “enterprise solution for managing X,” you have a vector gap.

A second test — the one I often use with clients: take the most important query for your business and search for it on Perplexity. Look at the sources it cites. Read those sources. Compare their language with yours. If the cited sources use words closer to the query than your pages do, you’ve found the gap that makes you invisible.

This mechanism is the foundation of how BM25 and hybrid search work — the lexical matching that combines with semantic matching — and chunk retrieval, where your paragraphs get converted into vectors and compared with the query. In both cases, the distance between your language and your customer’s language is the decisive factor.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives

Step-by-step guides: why AI loves them (and how to write them) AI Agents and APIs: Your Business Can Become a Service the AI Calls Is AI inventing things about your brand? It happens when it can’t find reliable data Cover the Whole Workflow or the AI Ignores You (and Picks Another Source) Whoever Gets Cited in ChatGPT’s First Turn Has an Edge Over Everyone Else If the AI says ‘might’ when talking about you, you have a trust problem If your brand info contradicts itself, AI picks a competitor ‘Recommend the best X in Y’: does your content match this query?

1.2 Evaluation & Scoring 8 deep dives

Writing Too Complex? AI Struggles More to Use Your Content How to Become the Brand AI Generates Automatically for Your Industry Want AI to rephrase you? Write the answer exactly as you want it Exaggerated data on your site? AI discards it and picks whoever is more honest Your title says one thing, your content another? AI notices and penalizes you Logical gaps and contradictions? AI lowers your content’s score Who Is Your Brand Cited With? This Determines Your AI Category Are you rewriting what everyone else has written? AI wants novelty

1.3 LLM Architecture 8 deep dives

AI Replies With Outdated Data About Your Brand? Here’s Why It Happens Is your brand invisible to ChatGPT? The problem starts with how it reads it AI reads your page like a book: it skips the middle How AI Decides Which Words Matter Most on Your Page If your page is too long, the AI cuts it and loses you Why ChatGPT Always Recommends the Same Brands (and How to Get on the List) The semantic distance between you and your customer decides whether AI finds you You are here For AI, your page structure matters more than length

1.4 Retrieval & Grounding 7 deep dives

Perplexity and Bing Chat search in real time: are you in their index? Exact keywords or synonyms? AI needs both (here’s why) AI doesn’t read your whole page — it slices it into chunks After retrieval comes reranking: this is where generic content loses Want AI to cite your site by name and with a link? Here’s what you need to give it AI rewrites the question before searching: is your content ready? AI combines multiple sources to answer: are you in at least 2 of them?

1.5 Training & Alignment 7 deep dives

Useful, accurate and safe: the 3 criteria AI uses to judge your content The AI’s Internal Filters Can Block Your Site Without Warning Is your industry underrepresented in the training data? AI already starts at a disadvantage Vertical AI models: if you’re not in their data, you don’t exist in their world Copied content? The AI keeps the original and discards yours The perfect answer according to AI: structured, specific, with sources Aggressive SEO in 2026? AI Safety Filters Are Already Penalizing You

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →