How AI engines think

Exact keywords or synonyms? AI needs both (here’s why)

Roberto Serra 25 June 2026·~6 min read

You've rewritten your content to be more natural and fluent — but you still don't show up on Perplexity, or you've disappeared from where you used to rank. The problem is that AI engines use two search methods in parallel: one looks for the exact words, the other looks for the meaning. If you optimized for just one, you're losing half of all possible searches. The competitor who balanced both consistently outranks you, even with content less in-depth than yours. Combining the two approaches on the same page is simpler than it seems.

A client asked me something I hear often: “I redid all the content on my site with more natural, more conversational language, like you recommend. But I still don’t show up on Perplexity. What am I doing wrong?”

The problem was subtle. He had stripped out the blunt keywords of old-school SEO and written everything in a conversational style. The content was nicer to read, semantically richer. But he had lost the match with the exact keywords users type into queries. And RAG systems, contrary to what you might think, use both search methods at the same time.

Hybrid search: two channels in parallel

As I explained in the article on RAG, systems like Perplexity search the web for sources before answering. What I haven’t dug into yet is how they search — and the answer is more interesting than it seems.

The survey by Gao et al. (2024) on RAG describes the mechanism in the section devoted to hybrid retrieval:

“Sparse and dense embedding approaches capture different relevance features and can benefit from each other by leveraging complementary relevance information. Sparse retrieval models can enhance the zero-shot retrieval capability of dense retrieval models and assist dense retrievers in handling queries containing rare entities, thereby improving robustness.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)

In practice, two engines run in parallel:

BM25 (sparse) is a lexical-matching algorithm. It searches for the exact words of the query in your content. If the user searches for “SEO consultant Milan”, BM25 looks for pages that contain exactly those words. It’s fast, precise, and especially strong with rare entities — brand names, niche technical terms, city names.

Dense search (embeddings) is the one I told you about in the article on the vector space. It converts query and content into vectors and measures how close their meanings are. “SEO consultant Milan” and “organic visibility expert in the Milan area” are lexically different but semantically close.

The paper by Zhang et al. (2024) on hybrid search documents how the two approaches are combined in modern systems:

“Datasets are embedded using Bge/Gte models for dense and Splade/BM25 models for sparse.”
(Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors)

The end result is a list of sources ranked by combining the scores from both — typically with a 50/50 or 60/40 weighting in favor of semantics. If your content is missing from either channel, the combined score collapses.

What it means for anyone who wants to be found

The practical consequence is a double requirement that many fail to meet.

Scenario 1: you have the keywords but not the context. Your site repeats “SEO consultant Milan” on every page, with the keyword density of 2015. BM25 finds you. But semantic search classifies you as thin content — few variations, little context, little added value compared to a competitor who explains the same concept in depth.

Scenario 2: you have the context but not the keywords. Your site talks about “strategies for improving the organic visibility of professional businesses in the Lombardy metropolitan area”. Semantically very rich. But if the user types “SEO consultant Milan”, BM25 doesn’t find you because those exact words aren’t there.

I analyzed this across 20 B2B service pages, comparing the BM25 score (with a local algorithm) and the semantic score (with an embedding model) against a set of 10 real queries for each page. The pages that had both signals strong showed up as sources on Perplexity in 60% of cases. Pages strong on only one of the two channels: 20-25%. Pages weak on both: under 5%.

The difference between 60% and 20% is enormous — and it’s the difference between being cited regularly and showing up now and then by chance.

Common mistake

If your content is missing from either channel, the combined score collapses.

The sweet spot: keywords inside, context around

The solution isn’t to choose between keywords and semantics. It’s to combine them on the same page, in the same section, often in the same sentence.

A concrete example. Instead of writing:

“Our firm offers SEO consulting in Milan.”

Write:

“Our firm offers SEO consulting in Milan — we help companies and professionals improve their organic visibility on search engines and in AI answers, from strategy to execution.”

The first version has the exact keyword but zero semantic context. The second has the exact keyword (“SEO consulting in Milan”) AND a rich semantic context (“organic visibility”, “search engines”, “AI answers”, “strategy”, “execution”) that covers variations and synonyms of the same query.

A sentence like this works on both channels at the same time. BM25 finds it for the exact keyword. Semantic search finds it for the extended meaning. The combined score is high.

Pro tip

Lexical check: search the text for the 5 most important exact keywords for that page.

The keywords that matter for BM25

One thing that sets this approach apart from traditional SEO: for BM25, the keywords that matter aren’t the creative long-tail ones. They’re the precise words a user would type in a conversational query to an AI engine.

They’re often more direct than you’d think: “how to choose an accountant”, “best management software for restaurants”, “web agency Padua”. The language of AI queries tends to be more natural than that of Google searches — but the specific keywords (the name of the service, the city, the type of business) are still there, and BM25 looks for them exactly.

Where to find these keywords: Google Search Console (the real queries), the People Also Ask, and above all — pay attention — the queries you type yourself when you look up competitors on AI engines. Those are the same queries your potential clients are running.

How to check the coverage of your pages

Take the 5 most important pages on your site and run a double check:

Lexical check: search the text for the 5 most important exact keywords for that page. Are they there? How many times? Are they in strategic spots — title, first paragraph, heading, conclusion?

Semantic check: read the content while mentally removing the exact keywords. Does it still explain the concept completely? Does it use natural synonyms and variations? Would a reader who didn’t know the keyword still understand what it’s about?

This double check gives you a first snapshot of the situation — it’s a starting point, not an exhaustive analysis. A full analysis requires tools that measure BM25 and semantic scores quantitatively, across a broad sample of real queries. But even with just this manual check you can tell whether your pages are working on both channels or ignoring one of them — and the difference in results, as I was saying, can range from a 20% to a 60% chance of showing up as a cited source.

Once the hybrid system has retrieved your pages, they move on to the next phase: chunk retrieval, where the system decides which specific piece of your page to use, and then reranking, which reorders the sources by quality. I cover those in the next articles.

Chapter 1 · How AI engines think

Continue with the deep dives

38 deep dives across the 5 sections of the chapter.

1.1 AI Reasoning 8 deep dives

Step-by-step guides: why AI loves them (and how to write them) AI Agents and APIs: Your Business Can Become a Service the AI Calls Is AI inventing things about your brand? It happens when it can’t find reliable data Cover the Whole Workflow or the AI Ignores You (and Picks Another Source) Whoever Gets Cited in ChatGPT’s First Turn Has an Edge Over Everyone Else If the AI says ‘might’ when talking about you, you have a trust problem If your brand info contradicts itself, AI picks a competitor ‘Recommend the best X in Y’: does your content match this query?

1.2 Evaluation & Scoring 8 deep dives

Writing Too Complex? AI Struggles More to Use Your Content How to Become the Brand AI Generates Automatically for Your Industry Want AI to rephrase you? Write the answer exactly as you want it Exaggerated data on your site? AI discards it and picks whoever is more honest Your title says one thing, your content another? AI notices and penalizes you Logical gaps and contradictions? AI lowers your content’s score Who Is Your Brand Cited With? This Determines Your AI Category Are you rewriting what everyone else has written? AI wants novelty

1.3 LLM Architecture 8 deep dives

AI Replies With Outdated Data About Your Brand? Here’s Why It Happens Is your brand invisible to ChatGPT? The problem starts with how it reads it AI reads your page like a book: it skips the middle How AI Decides Which Words Matter Most on Your Page If your page is too long, the AI cuts it and loses you Why ChatGPT Always Recommends the Same Brands (and How to Get on the List) The semantic distance between you and your customer decides whether AI finds you For AI, your page structure matters more than length

1.4 Retrieval & Grounding 7 deep dives

Perplexity and Bing Chat search in real time: are you in their index? Exact keywords or synonyms? AI needs both (here’s why) You are here AI doesn’t read your whole page — it slices it into chunks After retrieval comes reranking: this is where generic content loses Want AI to cite your site by name and with a link? Here’s what you need to give it AI rewrites the question before searching: is your content ready? AI combines multiple sources to answer: are you in at least 2 of them?

1.5 Training & Alignment 7 deep dives

Useful, accurate and safe: the 3 criteria AI uses to judge your content The AI’s Internal Filters Can Block Your Site Without Warning Is your industry underrepresented in the training data? AI already starts at a disadvantage Vertical AI models: if you’re not in their data, you don’t exist in their world Copied content? The AI keeps the original and discards yours The perfect answer according to AI: structured, specific, with sources Aggressive SEO in 2026? AI Safety Filters Are Already Penalizing You

The author

Roberto Serra at the Senate of the Republic

Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”

Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in

Learn more about Roberto Serra →