Entities and Knowledge Graph

Named Entity Recognition: how AI decides whether your brand is “someone” or just text

Roberto Serra 25 June 2026·~8 min read

Before it even evaluates the quality of your content, the AI has to recognize your brand as a name — a real entity, not a generic phrase like "our industrial solutions". If that recognition doesn't kick in, to the model you're background noise: text among text, with no identity. It's not a question of how much you write or how good you are: it's the first filter, the one that comes before everything else. In ten minutes you can test whether your brand passes this filter — with free tools.

Imagine the owner of a small engineering company typing into ChatGPT “best industrial component manufacturers in Lombardy”. The AI replies with five names. His isn’t there. Not because he’s worse than the competitors mentioned — but because, to the AI model, his brand literally does not exist as a recognized entity. It’s just one string of text like any other, not “someone worth talking about”.

This is the mechanism I’m going to explain to you today. It’s called Named Entity Recognition, and it decides upstream whether your brand has the right to appear in AI answers or whether it stays background noise.

What “entities” are to an AI model

When ChatGPT, Claude or Perplexity read a document, they don’t see words: they see chunks of text that need to be labeled. Some are common nouns, others are proper nouns — and the latter trigger a specific kind of recognition.

In the research world, Villena et al. (2024) define the mechanism like this:

“One interesting task in natural language processing (NLP) is named entity recognition (NER), which seeks to detect mentions of relevant information in documents.”

Villena et al. (2024)

Translated: the system doesn’t “read” everything the same way. It actively looks for mentions of relevant information — people, companies, places, products — and separates them from the rest.

The operational consequence is simple. If on your site the brand “TecnoImpianti Soluzioni Industriali” appears only as ordinary text, with no structural signals identifying it as an organization, to the model it’s indistinguishable from “our industrial solutions”. It doesn’t make it onto the shelf of recognized entities, and when the AI looks for “who to turn to”, it doesn’t find you.

The same authors spell out what an entity means:

“NER is the task of finding spans of text that constitute named entities (anything that can be referred to with a proper name) and tagging the entity type.”

Villena et al. (2024)

In practice: anything that can be called by a proper name — and to which a category can be assigned (person, company, place).

Why NER sits upstream of everything else I’ve explained to you

In previous articles I told you about how models break text into tokens and how every concept lives in a vector space. Named Entity Recognition comes first: it’s the filter that decides what deserves to become an anchored concept and what stays free-flowing text.

Let me give you the example I always use with clients. If you write “Studio Associato Rossi provides tax consulting in Padua”, a system with NER active labels “Studio Associato Rossi” as ORGANIZATION and “Padua” as LOCATION. They become anchors. The AI now knows there’s an organization with that name in a specific place, and it can cite it when someone asks about accountants in Padua.

If instead the text reads “our firm provides tax consulting in our city”, no entity is extracted. To the model there’s no specific subject to cite. And that’s why the inverted-pyramid structure only works if the subjects are identifiable.

The four categories that systems recognize most often are documented explicitly:

“The four most common entity types are person, location, organization, and geopolitical entity.”

Villena et al. (2024)

It follows that, for an Italian B2B brand, the two attributes that really matter are ORGANIZATION (the company name) and PERSON (the names of founders, CEO, content authors). If neither gets extracted, you’re invisible as a subject.

Common mistake

If on your site the brand “TecnoImpianti Soluzioni Industriali” appears only as ordinary text, with no structural signals identifying it as an organization, to the model it’s indistinguishable from “our industrial solutions”.

The test you can run in 10 minutes without being technical

Here’s the procedure I have all my clients run as a first check. It doesn’t replace a real audit, but in 10 minutes it tells you whether you have an obvious problem.

Open displaCy ENT (it’s a free entity recognizer that runs in the browser): demos.explosion.ai/displacy-ent.

Paste a typical sentence from your homepage or your “about us” page into the box. Something like: “Automeccanica Lecce has been producing spare parts for machine tools since 1978 and works with the Polytechnic University of Turin”.

Click Parse. The tool colors the entities it finds and labels them. Three possible outcomes:

Your brand is colored as ORG: you’re recognizable. A good starting signal.
Your brand is colored but as PERSON or MISC: recognized but misclassified. An ambiguity problem.
Your brand isn’t colored: the system doesn’t see it as an entity. There’s work to do here.

Be careful: this is an entry-level test. It uses a generalist English model, it doesn’t know every Italian SME, and it doesn’t capture what ChatGPT or Gemini actually know about your brand. A real analysis requires professional tools and cross-engine comparisons. But if you fail even this, you have a certainty: the basic signals aren’t there.

After the NER test, also check the schema side. Open Google’s Rich Results Test, paste your homepage URL, and search the report for the word “Organization”. If it appears, you’re giving the web a clear structured signal. If it doesn’t appear, you’re leaving the recognition of your brand as an organization to chance.

Pro tip

Solution: use the short, distinctive name in the opening position, followed by the description.

The test I ran on 40 Italian SMEs

I took 40 Italian B2B SME websites (engineering firms, professional practices, niche e-commerce) and ran the homepage + “about us” page through displaCy ENT. Then I did the same check with professional multilingual NER tools.

Result: out of 40 brands, 14 were recognized as ORG consistently (35%). Another 11 were recognized but with an ambiguous label — sometimes ORG, sometimes MISC, sometimes PERSON (27%). The remaining 15, about 38%, were not extracted as entities in any test.

The pattern that recurred in the 15 “invisible” ones: brand names that were too descriptive (“Soluzioni Ambientali Integrate”), no Organization schema on the homepage, and zero linked Wikidata entries.

Two honest caveats: it’s an indicative test, not an academic study. The sample isn’t large, and results vary across different AI engines — but the pattern around “descriptive-without-schema” brands is too clear to be random.

The mistakes I notice most often

In the audit briefs I receive, four patterns keep coming up.

Indistinguishable descriptive brand. The name is a generic phrase like “Consulenza Ambientale Professionale Srl”. The NER system reads it as the description of a service, not as a proper name. Solution: use the short, distinctive name in the opening position, followed by the description.

No Organization schema. The homepage contains no structured markup saying “this page is about an organization with this name, this address, this website”. To the web it’s just one page like any other.

Brand name used in too many variant forms. “TecnoImpianti”, “Tecno Impianti S.r.l.”, “Tecnoimpianti Soluzioni” on the same page. The recognizer doesn’t know which is the canonical form. Pick one and be consistent.

Founder with no identity. The CEO or owner never appears with first and last name alongside their professional title and company. This also connects to what I wrote about the author being recognized as an entity: if the person isn’t an entity, the company loses a piece of credibility in the model’s eyes — a theme that ties directly to how E-E-A-T works for AI.

What to do concretely, in order of priority

You don’t need a revolution. You need four moves in sequence:

Test the key sentences from your homepage and “about us” page on displaCy ENT. Note which entities get recognized and which don’t.
Open the Rich Results Test on your homepage. If you don’t find “Organization” in the report, add the structured schema with legal name, website, address, logo.
Normalize the brand name: a single canonical form, used consistently across title, H1, footer, contacts.
Create or complete the brand’s Wikidata entry and link the official URL. It’s the most direct way to tell the web “this string is a real entity, here are its attributes”.
Compare the result with the 3-5 competitors the AI cites when you ask the generic question about your sector. If they show up and you don’t, the gap is almost always in the four points above.

The authors of the paper point out a detail that matters even for those who don’t do professional NER:

“For training NER models, one needs a large corpus of task-specific annotated text, but constructing an annotated corpus is both time-consuming and expensive.” — Villena et al. (2024)

Translated into practice: building a custom entity recognizer is expensive, and that’s why modern AI models rely on shared knowledge (Wikidata, schema.org, citations in authoritative sources). If you want to be recognized as an entity, your brand has to be documented outside your own site, not just inside it.

NER isn’t a magic factor and it isn’t enough on its own. But it sits upstream: if you’re absent here, every later investment in content and authority delivers less than it could.