Entities and Knowledge Graph

sameAs: the glue that holds your identity together for AI

Roberto Serra 25 June 2026·~8 min read

You have LinkedIn up to date, Google Business polished, maybe even a Wikidata entry — but ChatGPT describes you in a confused and fragmented way, as if you were three different companies. It's not a flaw in the model: without an explicit link in your site's code, those three profiles are three separate identities for the AI. All the credibility you've built on each platform never adds up. A technical field in your site, with five URLs, is all it takes to unify your identities into a single recognizable entity.

You have a clean brand, a LinkedIn presence rich with updates, a Google Business profile polished with photos and reviews, a Wikidata page that is minimal but in order. And yet, when you ask ChatGPT or Perplexity who you are, you get fragmented answers: here it describes you as a “fashion company”, there as an “artisan workshop”, and elsewhere the connection between the two is missing entirely.

The reason is almost always the same: for AI engines, without an explicit signal, those three profiles are three different entities. There is nothing in your site’s code that says “this LinkedIn page is me, and this Wikidata entry is also me”. And when the AI tries to reconstruct your identity, it pieces together disconnected fragments instead of a single figure.

Today I’ll walk you through the simplest and most underrated piece: the `sameAs` property of the Organization schema. It is literally the glue that binds your profiles into a single entity.

What “unifying an entity” means for an AI model

The problem AI engines face when talking about an SME brand is exactly what in research is called entity resolution: figuring out whether two records (a LinkedIn page and a Wikidata one, or two entries in different directories) describe the same thing or two different things.

In the world of database research, Maciejewski and colleagues in 2025 explored exactly this space, with an angle that is interesting for anyone working with heterogeneous data like the kind AI collects across the web.

“To address these challenges, we focus on Schema-agnostic Progressive Entity Resolution. Velocity is addressed by the progressive functionality, which yields results before processing all input data through a pay-as-you-go functionality. Volume is addressed by Filtering, which restricts the computational cost to the most similar entity profiles, disregarding those dissimilar. Variety is addressed by the schema-agnostic functionality, which represents every entity profile through a concatenation of all attribute values, regardless the respective attribute names.”

Jakub Maciejewski et al., 2025

Translated into plain English: the researchers tackle the problem with a schema-agnostic approach, meaning one that does not require all the data to share the same structure. And “progressive” because it processes first the records most likely to match, and only afterwards the rest.

From this follows a very concrete business implication: when the AI tries to merge your Wikidata entry, your Crunchbase profile and your site, it does so without those records sharing the same schema. It looks for connection signals. If you provide them explicitly, the AI recognizes you as a single entity on the first pass. If you don’t, the AI gets there by inference, sometimes correctly, very often not.

The `sameAs` property in schema.org is exactly the explicit signal we’re talking about.

Why sameAs sits upstream of all your authority work

In the previous articles in this series I talked to you about Author Entity Recognition and about E-E-A-T applied to AI: both start from the assumption that the AI engine knows who you are. Not as a media personality, but as an entity that can be identified with certainty.

If your identity is fragmented, all the work of building authority gets scattered. Imagine spending 12 months getting your brand cited in industry magazines, and then discovering that the AI attributes half of those citations to a “phantom” entity not connected to your site.

`sameAs` doesn’t magically solve this problem — it’s not a magic factor — but it’s the first building block. It plants an anchor in your site that tells the AI model: “everything you find at these external URLs belongs to the same identity”.

Common mistake

Imagine spending 12 months getting your brand cited in industry magazines, and then discovering that the AI attributes half of those citations to a “phantom” entity not connected to your site.

The test you can run in 5 minutes

Open Google’s Rich Results Test, paste in your homepage URL and look in the extracted schema for the `Organization` object. Check one thing only: is there a `sameAs` property? And if so, how many URLs does it contain?

A decision threshold, very binary:

Zero URLs in sameAs or the property missing: you’re in the situation most Italian SMEs start from. It’s a starting point, not a disaster, but it needs fixing.
From 1 to 3 URLs: you’re doing something, typically there’s LinkedIn and little else. You’re halfway there.
From 5 URLs upward with the main profiles covered (LinkedIn, Wikidata, Google Business, Crunchbase, relevant social networks, industry directories): you’re giving the AI engine a clear picture.

This check is entry level. It tells you whether the basic signal is there, but it doesn’t tell you whether the AI model is using it well: for that you need deeper analysis with professional tools.

Pro tip

Make a list of every public profile of your brand: LinkedIn company, Google Business (business.google.com), Wikidata if an entry exists, Crunchbase, Facebook, Instagram, YouTube, Twitter/X if active, Italian and foreign industry directories where you’re listed.

The test I ran on 10 Italian brands

I took 10 bespoke artisan tailoring brands in Tuscany, five based in Florence and five across Prato, Siena and other cities in the region. Five of these brands had `sameAs` filled in within the Organization schema with at least 5 external profiles. The other five did not — either they didn’t have the property at all, or they had only Facebook.

I asked ChatGPT, Claude and Perplexity the same query: “who is [brand name]” for all 10. The pattern that emerged (indicative test, small sample but a clear signal):

On the 5 brands with a rich sameAs: 13 answers out of 15 correctly merged the site, the main reference social profile and any Wikidata entry into a single coherent description. Two answers a bit thin but coherent.
On the 5 brands without sameAs: 6 answers out of 15 mixed the brand’s data with that of a namesake or a different workshop. In three cases the AI engine generated a description that blended two distinct tailoring shops as if they were a single business.

It’s not a study, it’s an operational survey of 10 brands and 30 queries. But the signal is clear enough to justify an hour of work from your developer to fix the schema.

The mistakes I keep running into more and more often (even among the unsuspected)

Let me walk you through four patterns I’ve seen repeatedly on SME sites, so that when you open your own code you know what to look for.

The sameAs with only Facebook. The developer put in the Facebook page URL “because we had it” and stopped there. Result: the AI engine has one extra link, but all the other profiles (LinkedIn company, Google Business, Wikidata if it exists, industry directories) stay disconnected.

The sameAs on dead or redirected URLs. A classic case: the schema contains a link to a Twitter profile from 4 years ago, since abandoned, or to a Crunchbase page that’s no longer updated. It doesn’t do huge damage, but it signals neglect that doesn’t help.

The sameAs with non-canonical URLs. A link to the owner’s personal LinkedIn profile instead of the company page. A link to a subdomain instead of the main directory. The AI engine still records the connection, but the unified entity becomes imprecise.

The lack of a reciprocal sameAs on Wikidata. You put Wikidata among your `sameAs`, good. But in the Wikidata entry there’s no inverse property (P856 — official website — or the property dedicated to external identifiers). The link is one-directional, and AI engines that rely heavily on Wikidata don’t complete the loop.

What to do in practice

An operational audit in two steps before involving your developer.

First step, the inventory. Make a list of every public profile of your brand: LinkedIn company, Google Business (business.google.com), Wikidata if an entry exists, Crunchbase, Facebook, Instagram, YouTube, Twitter/X if active, Italian and foreign industry directories where you’re listed. They must be live, canonical, up-to-date URLs.

Second step, adding to the schema. The developer adds the `sameAs` property inside the `Organization` block of the homepage’s JSON-LD schema, with an array containing all the URLs from the inventory. Then you go back through the Rich Results Test to verify the schema is valid.

Third step, if you have a Wikidata entry: open the entry, verify that the official website is filled in and that the external identifiers (LinkedIn, Crunchbase, etc.) are present. The link must be bidirectional to close the loop.

Once that’s done, compare against the 3-5 competitors the AI cites when you run a generic query about your industry in Tuscany. Nine times out of ten, the brands that appear have richer `sameAs` than the brands that don’t. It’s not the only factor, but it’s a recurring signal.

Where this piece fits into your AI visibility strategy

`sameAs` on its own won’t get you into AI answers. It’s a precondition: it’s needed so that all the work you do on authority, on content, on the inverted-pyramid structure of your articles gets attributed to your single identity and not scattered across parallel entities.

In the upcoming articles in this series I’ll show you how to build a minimal but clean Wikidata entry (it’s the logical next step), how to use the Organization schema to describe your business so the AI understands it on the first pass, and how to connect authors and companies in the knowledge graph when you publish content under a personal byline.

The thread stays the same: making your digital identity so readable that when a user asks “who does X in Florence” to ChatGPT or Perplexity, the engine has zero doubts about who you are and what you do.