How AI engines think

Vertical AI models: if you’re not in their data, you don’t exist in their world

Roberto Serra 25 June 2026·~7 min read

If you work in fields like healthcare, finance, or law, there are specialized AIs that professionals use every day — and those AIs were trained on specific sources, not on the entire web. If you're not in those sources, you don't exist for those models: you never show up in their recommendations, not even when you'd be the most qualified choice. Your competitors who understood where to stake their ground are gathering clients from these vertical AIs while you're not even in the game. Knowing where to position yourself in these circuits is a precise strategy, not a complicated one.

There’s a category of AI models that your marketing probably ignores. They’re not ChatGPT, they’re not Claude, they’re not Gemini. They’re the models fine-tuned on vertical datasets — trained specifically for healthcare, finance, law, real estate. And if you operate in one of these sectors, there’s a concrete risk: those models don’t know you. Even if you’re the most cited professional in your field.

This article explains the technical mechanism of fine-tuning, why it produces a structural bias toward the dataset’s sources, and what to do so you’re not invisible in the models that matter in your sector.

The mechanism: what happens during fine-tuning

Fine-tuning is not a superficial update. It’s a separate training phase that rewrites the model’s internal priorities.

As documented by Xu et al. (2026), “Supervised Fine-Tuning (SFT) represents the foundational approach to adapting LLMs for tool use”.

SFT works like this: you take a base model — Llama, GPT, any open source model — and retrain it on labeled examples specific to the domain. The model learns to respond in the way the fine-tuning dataset considers “correct”.

The process follows a precise hierarchy. As described by Minaee et al. (2025), “Then, an initial version of LLaMA-2 Chat is built via supervised fine-tuning”.

First comes pre-training on enormous, generic corpora, then SFT on specific examples that steer the model toward the behaviors and content of the target domain. Fine-tuning doesn’t replace pre-training — it specializes it. But this specialization has an effect that is often underestimated.

When SFT happens on a vertical dataset, the model develops preferences. It learns to recognize certain formats, certain authors, certain sources as “good exemplary answers”. It learns to ignore or penalize whatever doesn’t fit that pattern. This applies to retrieval too. The information retrieval system is recalibrated on the same dataset — which means the search phase itself is also steered toward the sources known to the model.

The technical result is clear: a model fine-tuned on clinical data learns that “correct answer” means citing PubMed, WHO guidelines, ministerial protocols. A model fine-tuned on legal data learns that “correct answer” means citing case law, codes, opinions from authorities. It doesn’t do this because someone explicitly told it to ignore you — it does it because you were never part of its definition of an “exemplary answer”.

From the mechanics to the impact: what follows for your visibility

So far the technical mechanism, verified by the sources. From this follows a deduction with direct implications for anyone working in vertical sectors.

If SFT rewrites the model’s preferences based on the training data, then visibility in a fine-tuned model doesn’t depend on how authoritative you are in absolute terms — it depends on where you are authoritative within the dataset. A doctor with 20 years of experience and a well-written site doesn’t exist for a clinical model if they’ve never published on PubMed. A law firm with important clients doesn’t exist for a legal model if it doesn’t appear in case law or law journals.

This is different from the problem of visibility in generic models like ChatGPT or Claude. There, as we saw in the article on how pre-training decides what an AI model knows, the mechanism is presence in the pre-training corpus — which is vast, varied, less controllable. In vertical models the mechanism is narrower and therefore easier to influence. The fine-tuning dataset of a clinical model is probably of manageable size: PubMed, UpToDate, a few specialized journals, national and international guidelines. If you get into those sources, you’re within the model’s perimeter.

In generic models you compete with millions of sites. In vertical models you compete with a few dozen sector databases. It’s a different game — and for many specialized operators, it’s the most important game.

Common mistake

A doctor with 20 years of experience and a well-written site doesn’t exist for a clinical model if they’ve never published on PubMed.

The current landscape: where vertical models exist

Not every sector has active, accessible fine-tuned models. But the main ones are already here.

In the medical and clinical field there are models like Med-PaLM (Google) and a series of open source models trained on clinical literature. The chatbots used by hospitals and telemedicine platforms are almost always fine-tuned models, not generic ones.

In finance the reference is BloombergGPT, trained on specific financial corpora. Risk analysis tools, compliance platforms, and assistants for financial advisors use vertical models with proprietary datasets.

In the legal field growth is rapid: large law firms and legaltech platforms are adopting models fine-tuned on national and international case law. The base dataset is often made up of codes, rulings, and doctrinal opinions.

In real estate and insurance the vertical models are more fragmented, but the trend is clear: every platform that uses AI to generate valuations, risk analyses, or responses to clients is carrying out fine-tuning on sector data.

If your sector is on this list, the question is not “should I worry about it?” — the answer is already yes. The question is “what can I do?”

Pro tip

Second level: publish in the fine-tuning sources, not just on your own site.

What to do: a practical strategy for vertical sectors

The strategy is structured on three levels, in order of priority.

First level: identify the likely fine-tuning sources in your sector. Fine-tuning datasets aren’t always publicly documented. But you can make reasonable inferences: look at which literature the model cites when it answers questions in your sector. Test the model with specific queries and observe the sources that emerge. The sources cited most frequently and consistently are almost certainly in the fine-tuning dataset. This is an exercise worth doing once, methodically.

Second level: publish in the fine-tuning sources, not just on your own site. This is where the most important break with traditional SEO happens. Optimizing your site isn’t enough if the model doesn’t include sites in its definition of an “exemplary answer”. You have to be present where the model learned to look for answers. For the doctor, that means publishing in journals indexed in PubMed or contributing to guidelines. For the lawyer, that means having opinions cited in case law or articles in legal journals. For the financial advisor, that means analyses on platforms that the financial model recognizes.

Third level: adapt the content format to the dataset’s format. Fine-tuning datasets have recognizable structures. Medical papers have abstracts, methodology, conclusions. Legal opinions have a specific argumentative structure. Financial reports have standardized sections. When you produce content in your sector, adopting the format that fine-tuning recognizes as a “well-structured answer” increases the probability that your contribution will be used as a positive example — or cited in a response.

This reasoning connects to what I explained about the role of human feedback in steering models’ preferences and about how constitutional constraints filter content: fine-tuning isn’t the only filter, but in vertical models it’s the main filter.

How to check your situation today

Before building a strategy, you need to know where you stand.

Identify the vertical models active in your sector. Search “[sector] AI model” or “[sector] LLM” on Google Scholar and on arxiv.

If you find an accessible model (even via API or platform), test it with the same questions you ask ChatGPT. Does your brand show up? If not, the model doesn’t know you.

Identify the sources the model cites in its answers. Compile a list: it’s your priority list for content distribution.

Verify your presence in those sources. Not generic presence — presence in the specific sources of the dataset.

The last point ties into the question of training data deduplication: being cited once isn’t enough — the deduplication algorithm can remove redundant occurrences, and multiple, distributed presence in the sector’s authoritative sources carries a different weight than a single mention.

Generic models and vertical models are two separate games with different rules. If you operate in healthcare, finance, or law, your AI visibility doesn’t depend only on how present you are on the web in general — it depends on how present you are in the specific sources that defined what’s “correct” for that model. Identifying those sources and building presence in them is the first concrete action to take this week.

Vertical AI models: if you’re not in their data, you don’t exist in their world

The mechanism: what happens during fine-tuning

From the mechanics to the impact: what follows for your visibility

The current landscape: where vertical models exist

What to do: a practical strategy for vertical sectors

How to check your situation today

Continue with the deep dives