You've rewritten your content to be more natural and fluent — but you still don't show up on Perplexity, or you've disappeared from where you used to rank. The problem is that AI engines use two search methods in parallel: one looks for the exact words, the other looks for the meaning. If you optimized for just one, you're losing half of all possible searches. The competitor who balanced both consistently outranks you, even with content less in-depth than yours. Combining the two approaches on the same page is simpler than it seems.
A client asked me something I hear often: “I redid all the content on my site with more natural, more conversational language, like you recommend. But I still don’t show up on Perplexity. What am I doing wrong?”
The problem was subtle. He had stripped out the blunt keywords of old-school SEO and written everything in a conversational style. The content was nicer to read, semantically richer. But he had lost the match with the exact keywords users type into queries. And RAG systems, contrary to what you might think, use both search methods at the same time.
Hybrid search: two channels in parallel
As I explained in the article on RAG, systems like Perplexity search the web for sources before answering. What I haven’t dug into yet is how they search — and the answer is more interesting than it seems.
The survey by Gao et al. (2024) on RAG describes the mechanism in the section devoted to hybrid retrieval:
“Sparse and dense embedding approaches capture different relevance features and can benefit from each other by leveraging complementary relevance information. Sparse retrieval models can enhance the zero-shot retrieval capability of dense retrieval models and assist dense retrievers in handling queries containing rare entities, thereby improving robustness.”
(Retrieval-Augmented Generation for Large Language Models: A Survey)
In practice, two engines run in parallel:
BM25 (sparse) is a lexical-matching algorithm. It searches for the exact words of the query in your content. If the user searches for “SEO consultant Milan”, BM25 looks for pages that contain exactly those words. It’s fast, precise, and especially strong with rare entities — brand names, niche technical terms, city names.
Dense search (embeddings) is the one I told you about in the article on the vector space. It converts query and content into vectors and measures how close their meanings are. “SEO consultant Milan” and “organic visibility expert in the Milan area” are lexically different but semantically close.
The paper by Zhang et al. (2024) on hybrid search documents how the two approaches are combined in modern systems:
“Datasets are embedded using Bge/Gte models for dense and Splade/BM25 models for sparse.”
(Efficient and Effective Retrieval of Dense-Sparse Hybrid Vectors)
The end result is a list of sources ranked by combining the scores from both — typically with a 50/50 or 60/40 weighting in favor of semantics. If your content is missing from either channel, the combined score collapses.
What it means for anyone who wants to be found
The practical consequence is a double requirement that many fail to meet.
Scenario 1: you have the keywords but not the context. Your site repeats “SEO consultant Milan” on every page, with the keyword density of 2015. BM25 finds you. But semantic search classifies you as thin content — few variations, little context, little added value compared to a competitor who explains the same concept in depth.
Scenario 2: you have the context but not the keywords. Your site talks about “strategies for improving the organic visibility of professional businesses in the Lombardy metropolitan area”. Semantically very rich. But if the user types “SEO consultant Milan”, BM25 doesn’t find you because those exact words aren’t there.
I analyzed this across 20 B2B service pages, comparing the BM25 score (with a local algorithm) and the semantic score (with an embedding model) against a set of 10 real queries for each page. The pages that had both signals strong showed up as sources on Perplexity in 60% of cases. Pages strong on only one of the two channels: 20-25%. Pages weak on both: under 5%.
The difference between 60% and 20% is enormous — and it’s the difference between being cited regularly and showing up now and then by chance.
If your content is missing from either channel, the combined score collapses.
The sweet spot: keywords inside, context around
The solution isn’t to choose between keywords and semantics. It’s to combine them on the same page, in the same section, often in the same sentence.
A concrete example. Instead of writing:
“Our firm offers SEO consulting in Milan.”
Write:
“Our firm offers SEO consulting in Milan — we help companies and professionals improve their organic visibility on search engines and in AI answers, from strategy to execution.”
The first version has the exact keyword but zero semantic context. The second has the exact keyword (“SEO consulting in Milan”) AND a rich semantic context (“organic visibility”, “search engines”, “AI answers”, “strategy”, “execution”) that covers variations and synonyms of the same query.
A sentence like this works on both channels at the same time. BM25 finds it for the exact keyword. Semantic search finds it for the extended meaning. The combined score is high.
Lexical check: search the text for the 5 most important exact keywords for that page.
The keywords that matter for BM25
One thing that sets this approach apart from traditional SEO: for BM25, the keywords that matter aren’t the creative long-tail ones. They’re the precise words a user would type in a conversational query to an AI engine.
They’re often more direct than you’d think: “how to choose an accountant”, “best management software for restaurants”, “web agency Padua”. The language of AI queries tends to be more natural than that of Google searches — but the specific keywords (the name of the service, the city, the type of business) are still there, and BM25 looks for them exactly.
Where to find these keywords: Google Search Console (the real queries), the People Also Ask, and above all — pay attention — the queries you type yourself when you look up competitors on AI engines. Those are the same queries your potential clients are running.
How to check the coverage of your pages
Take the 5 most important pages on your site and run a double check:
Lexical check: search the text for the 5 most important exact keywords for that page. Are they there? How many times? Are they in strategic spots — title, first paragraph, heading, conclusion?
Semantic check: read the content while mentally removing the exact keywords. Does it still explain the concept completely? Does it use natural synonyms and variations? Would a reader who didn’t know the keyword still understand what it’s about?
This double check gives you a first snapshot of the situation — it’s a starting point, not an exhaustive analysis. A full analysis requires tools that measure BM25 and semantic scores quantitatively, across a broad sample of real queries. But even with just this manual check you can tell whether your pages are working on both channels or ignoring one of them — and the difference in results, as I was saying, can range from a 20% to a 60% chance of showing up as a cited source.
Once the hybrid system has retrieved your pages, they move on to the next phase: chunk retrieval, where the system decides which specific piece of your page to use, and then reranking, which reorders the sources by quality. I cover those in the next articles.