Entities and Knowledge Graph

Are your brand in Italian and in English the same entity to AI? Probably not

Roberto Serra 25 June 2026·~8 min read

Your brand exists in Italian and in English — but to AI they are two different, disconnected entities. In the foreign markets you're trying to gain a foothold in, ChatGPT and Perplexity don't know that the English version of your site is you: same company, same story, but zero structural connection between the two versions. You're doing international marketing on foundations that are invisible to the models. The missing link can be fixed in half a day — and it unlocks AI visibility across every market you're investing in.

I tested 15 Italian wine brands that export to the US and UK: 12 out of 15 are invisible to AI when you ask in English, visible in Italian. Here’s why.

The query was simple. I asked ChatGPT, Claude and Perplexity — in Italian — “what are the best producers of Amarone della Valpolicella that export to the United States.” I then reopened a clean session and asked, in English, “best Amarone della Valpolicella producers exporting to the US.” Same intent, same search, different language.

The result: 3 wineries out of 15 appeared in both versions. The other 12 existed in Italian — in English they simply vanished, almost always replaced by the same 4-5 big names that dominate the English-language press. It’s not a wine-quality problem. It’s a multilingual entity matching problem: to AI, the Italian site and the English site of those wineries were two different entities, two disconnected identities, two reputations that didn’t add up.

In this article I’ll explain why this happens, what the research says and what you can do in practice to stitch the two halves of your brand back together — especially if you export or are about to.

What “multi-language entity matching” means for an AI model

When an AI model reads “Tenuta Santa Maria della Valpolicella” on a .it site and “Santa Maria Valpolicella Estate” on a .com site, it has to decide one thing: is this the same producer or are they two different companies? This decision is called cross-lingual entity matching and it’s the mechanism by which the system unifies — or fragments — your brand’s authority.

The problem stems from how entity recognition systems are built across multiple languages. In the world of multilingual NER (Named Entity Recognition) research, Mayhew et al. (2024) in the paper “Universal NER: A Gold-Standard Multilingual NER Benchmark” document a principle that is central to understanding your visibility problem: entity recognition is trained language by language, on annotations produced separately by native speakers across multiple iterations. There is no automatic step that tells the model “this entity in Italian corresponds to that one in English.” Each language is a world annotated on its own.

From this follows a direct consequence for your business. The model learns to recognize “Tenuta Santa Maria” as an entity when it reads a text in Italian, and to recognize “Santa Maria Estate” as an entity when it reads a text in English. But that they are the same node in the knowledge graph is not an automatic inference: it’s a connection that has to come from outside — from Wikidata, from structured schema, from About pages that state it explicitly.

If that connection isn’t there, the authority you’ve built in Italian — reviews, citations from Gambero Rosso, interviews in trade publications — stays confined to the Italian version. The American user who asks in English gets a list you’re not on.

Why this mechanism sits upstream of everything else

If you’ve followed the series, in earlier articles I explained how AI recognizes entities in text (NER) and how it disambiguates them (entity disambiguation). Multilingual matching is the layer above that again: it’s not about a single occurrence, it’s about the brand’s identity as a single node in the knowledge graph.

It’s also closely tied to the topic of embedding in vector space. Two different strings (“Tenuta Santa Maria”, “Santa Maria Estate”) end up at different points in the vector space. Without an explicit signal saying “they’re the same thing,” AI treats them as close but distinct — and for a user searching for “best Italian wine producer” that distance is enough to exclude you.

Common mistake

I see it often in companies that invested well in E-E-A-T on the domestic market and then are surprised not to show up abroad: the authority was there, but it was registered to a “semantic legal entity” different from the one the English-speaking AI was querying.

The test you can run in 10 minutes

You don’t need a technical audit to figure out whether you have this problem. All it takes is half an hour and a glass of wine (you pick which).

Open Wikidata and search for your company’s name. If a record exists, check the “Also known as” section at the bottom: are the English, German, French variants there? If only Italian is present, AI is starting out with half the information.
On your Italian homepage and your English one, open the source code (right-click → view source) and search for the string `hreflang`. If you don’t find it on both, the signal “these two pages represent the same company in different languages” isn’t reaching the engines.
Search the homepage code for the word `sameAs`. You should see a list of URLs — social profiles, Wikipedia, Wikidata. If the list exists on the Italian site but not the English one (or vice versa), you’re telling the AI engine that the two sites belong to different parties.
Open ChatGPT, Claude and Perplexity in clean sessions. Ask in Italian “who is [your company name].” Then ask in English “who is [your company name].” Compare: same information? Same facts about founding year, location, main products? If the answers diverge, you have two fragmented entities.

Binary thresholds: if you fail 2 or more checks out of 4, the probability that AI treats you as two distinct brands is high.

Pro tip

Create or update your company’s Wikidata record with labels in Italian, English and — where relevant — German, French, Spanish.

The test I ran (with caveats)

Back to the 15 Valpolicella wine brands I opened with. Small sample, non-random selection (I took wineries that openly export to the US, with estimated foreign revenue between 50,000 and 500,000 euros). It’s an indicative test, not a study.

Out of 12 wineries invisible in English:

11 had no Wikidata record with multilingual labels
9 didn’t have `hreflang` correctly implemented between the IT and EN versions
12 out of 12 didn’t have cross-lingual `sameAs` in their Organization schema

Correlation doesn’t prove causation, but the pattern is clear: the 3 wineries visible in both languages all three had Wikidata records with multilingual labels and `sameAs` that explicitly linked the two language versions. Real analysis requires professional tools and larger samples — but as a first signal, the pattern is enough to get started.

The mistakes I see most often

When I work with brands that export, these are the recurring patterns:

Two sites, two disconnected About Us pages. The Italian “About us” page tells one story, the English one tells another (often simplified). Neither explicitly states “this is the English version of [brand].” AI has no anchors to merge the two narratives.
Social media split by language without `sameAs`. An Instagram account “@cantinarossi” for the Italian market, “@rossiwines” for abroad. Legitimate, but without a `sameAs` declaration on the site the two profiles remain distinct entities.
Wikidata ignored or with a single label. The record exists, but only in Italian. No English label, no alias. To an English-speaking user asking Perplexity, you’re a brand with no pedigree.
Organization schema duplicated but not linked. The .it site has its schema with one `@id`, the .com site has its own with a different `@id`. No mutual reference. Two records, two identities.

What to do concretely

You don’t need a six-month project. You need a well-spent week:

Create or update your company’s Wikidata record with labels in Italian, English and — where relevant — German, French, Spanish. Add common aliases (the name as the foreign press uses it).
Implement `hreflang` between all pairs of translated pages (homepage IT ↔ homepage EN, product page IT ↔ product page EN). Verify with the Rich Results Test that there are no errors.
In the Organization schema of both sites, add a `sameAs` block that includes: the URL of the version in the other language, Wikipedia (if it exists), Wikidata, main social profiles. The same block, identical, on both sites.
Write an About page in English that explicitly states the continuity: “Cantina Rossi (also known as Rossi Wines for international markets) is an Italian winery based in Valpolicella.” One sentence, but it’s the textual anchor AI latches onto.
Compare yourself with the 3-5 competitors AI cites in your sector when you ask in English: almost always you’ll see they have a robust Wikidata and cross-lingual `sameAs`. That’s no coincidence.

It’s not a magic mechanism. `hreflang` and `sameAs` alone won’t make you appear if you have nothing relevant to say in English. But if you’ve already done the work in Italian — authority, press mentions, reviews — this is the stitch that carries that authority back into the English version of your brand, where it wasn’t reaching until now.

Where it fits in your AI visibility work

Multilingual matching is one of the most underrated pieces in the work of getting visibility in AI answers. I see it often in companies that invested well in E-E-A-T on the domestic market and then are surprised not to show up abroad: the authority was there, but it was registered to a “semantic legal entity” different from the one the English-speaking AI was querying.

In the upcoming articles in this series we’ll talk about how to build a solid Google Knowledge Graph entry, how to use Wikidata as a semantic backbone for your brand, and how sameAs becomes the glue that holds all your profiles together. If your brand speaks more than one language, start here: first unify the identity, then invest in the rest.