How AI engines think

After retrieval comes reranking: this is where generic content loses

Roberto Serra 25 June 2026·~8 min read

Even if the AI finds your content among the candidate sources, there is a second selection step that reorders it by relevance to the specific question — and content that talks about the topic in a generic way gets discarded in favor of whoever answers exactly that question. You can pass the first filter and lose at the second, without understanding why you are never cited. It is not a problem of content quantity nor of length: it is a problem of precision. Writing content that passes both filters requires a different approach from the one you are probably using.

Your content passed the first filter. The RAG system did the initial retrieval, pulled 20-100 candidate chunks, and your piece is among them. Good. But it is not over — the decisive round has yet to come.

There is a second model that takes those chunks and reorders them. Not by vague similarity, but by precise correspondence with the user’s real question. This is where generic content gets demoted, specific and authoritative content rises, and your brand either gains AI visibility or disappears for good from the context the AI will use to answer.

This is mechanics, not opinion. And if you do not understand how it works, you are optimizing for the wrong filter.

Why retrieval alone is not enough

If you read what I explained about chunk retrieval, you know that the system retrieves an initial pool of blocks based on vector embeddings and lexical search. I told you about how each section must work as a self-contained unit — but passing retrieval is only the first gate.

Retrieval is fast and approximate by definition: it compares your vector representation with that of the query and decides whether you are “close enough”. The problem is that “close enough” includes a lot of garbage. A “SEO Services” page that mentions ecommerce is semantically close to the query “how to choose a SEO consultant for an ecommerce” — but it does not answer the question.

This is where reranking comes in.

How reranking works — the second filter

After retrieval you still have a wide pool of candidates: 20, 50, sometimes 100 chunks. Passing them all to the generative model would be impossible — both because of context window limits and because you would introduce too much noise. The reranker solves this problem.

Unlike retrieval, which uses separate representations (query embedding + document embedding), the reranker is typically a cross-encoder: it takes the query and the chunk together, processes them in a single pass, and assigns a relevance score based on the direct interaction between the two texts. It is computationally expensive — but precise.

In the research world, the topic is studied in depth. A passage I find relevant comes from Ji et al. (2025):

“In addition to serving as a decision assessor for evaluating or re-ranking candidate outputs to improve alignment without parameter updates, RMs can also function as a utility function to guide prompt construction.”

Translated: the reranking model does not just reorder — it evaluates the quality of the output against the original query and can guide the very construction of the context the generator will receive. It is an arbiter that decides which content deserves to enter the final answer.

The practical consequence is that the reranker penalizes content that “talks about the topic” but does not answer the question. And it rewards whoever answers in a direct, specific and authoritative way.

Common mistake

A “SEO Services” page that mentions ecommerce is semantically close to the query “how to choose a SEO consultant for an ecommerce” — but it does not answer the question.

The dimensions the reranker evaluates

Not all reranking systems work the same way.

There are several types of rerankers active in commercial pipelines — some models specialized for relevance evaluation, other general LLMs used as judges. They change over time and vary between AI engines: any test is a sample, not an absolute truth.

What all these approaches have in common is that they evaluate three fundamental dimensions:

Topical relevance — does the chunk answer the specific question or does it only talk about the topic in a broad sense? “We provide SEO services for ecommerce” does not answer “how to choose”. “To evaluate a SEO consultant for ecommerce, ask for these 5 pieces of data” does answer.

Contextual pertinence — does the chunk have the context needed to be understood on its own? As I explained in the article on chunk retrieval, each block must work as a self-contained unit. The reranker penalizes whoever assumes knowledge external to the text.

Diversity — in more sophisticated systems, the reranker also balances the variety of sources, avoiding 5 identical chunks on the same concept. If you have 3 pages saying the same thing, only one passes.

Pro tip

Headings as questions or direct answers: “How to choose a SEO consultant for ecommerce: 5 concrete criteria” beats “Our SEO services for ecommerce” in every reranking system, because the correspondence with the user’s query is already explicit in the heading.

A concrete example: who survives reranking

User query: “how to choose a SEO consultant for an ecommerce?”

Retrieval pulls three candidates: your “SEO Services” page (it mentions SEO, consulting, ecommerce among the sectors), a competitor’s blog (“7 criteria for choosing the SEO consultant for your ecommerce”) and a generic guide to digital marketing for ecommerce.

The reranker compares each chunk with the query. Your page talks about what you do, it does not answer “how to choose”. The competitor answers with specific criteria. The guide is broad but not focused on the decision.

Result: the competitor in first position, your content demoted. Not because your service is worse — but because your content does not answer the question the user asked.

This pattern repeats every time you have an “umbrella” page: it covers the topic, it answers no specific question. Exactly the profile the reranker penalizes.

The connection with AI visibility

Understanding reranking changes the way you think about AI visibility. Being indexed is not enough. Being relevant in a vague sense is not enough. You must be the content that best answers the specific question the user is asking — and you must do it in the block the reranker examines.

Gao et al. (2024) put it sharply:

“Re-ranking the retrieved information to relocate the most relevant content to the edges of the prompt is a key strategy.”

“Edges of the prompt” — the edges of the context the generator receives. The system not only reorders, but positions the most relevant content in the privileged positions of the context window, the ones the generative model weighs the most. If your content does not pass reranking, it does not end up at the bottom of the list — it does not enter the context at all.

It follows that optimizing for reranking means optimizing for the answer. Not for the topic, not for the sector, not for the keyword: for the answer to the specific question the user will ask. It is the same logic as hybrid search — but applied at an even higher level of precision.

What to do concretely to survive reranking

Map questions, not topics: for each page of your site, identify the specific question it answers. Not “our SEO services” (topic) but “how to choose a SEO consultant for an ecommerce” (question). If you cannot formulate a specific question, that page is vulnerable.
The answer goes in the first paragraph: the reranker evaluates the correspondence between query and content. If the answer to the question is in the introduction, the relevance score is high. If it is buried at the fifth point of a list on page 2, the score collapses.
Headings as questions or direct answers: “How to choose a SEO consultant for ecommerce: 5 concrete criteria” beats “Our SEO services for ecommerce” in every reranking system, because the correspondence with the user’s query is already explicit in the heading.
Eliminate pre-answer filler: generic introductions about the sector, unnecessary historical context, transitional sentences — everything that precedes the actual answer lowers the chunk’s signal-to-noise ratio. The reranker sees it.
Create specific pages instead of umbrella pages: if you have a “Services” page with 10 services listed, the reranker does not know which question it answers. Ten specific pages — one for each service, each answering “what it is, who it is for, when to use it” — multiply your chances of being selected for the relevant queries.
Do not rely on a single test: reranking systems vary between AI engines and stochasticity means the same query gives different results at different times. Test on multiple platforms, multiple times, and consider the recurring patterns — not the single instances.

How to check where you are losing

The quickest test: take the query you want to appear for and search it on three different AI engines. Note the cited sources. Then read that content the way a reranker would — does it answer the question directly? How does it compare with your content?

If the cited sources are more specific than your page, the problem is clear. The reranker is doing its job, and you are losing at the second filter despite having passed the first.

The deeper test: for each key page, identify the question it should answer. Then look at your first paragraph — does it answer directly? If it takes more than 30 seconds to get to the answer, that page has a structural problem that reranking will expose mercilessly.

As I explain in the article on grounding and citation, the pipeline does not end here — but if you lose at reranking, you never reach the next step.

Start from your most important page: can you formulate in one sentence the question it answers? If yes, is that answer in the first paragraph? If no to either of the two, you have the exact point where to intervene.