Entities and Knowledge Graph

Rich Entity Attributes: why AI cites only “fat” entities in detail

Roberto Serra 25 June 2026·~9 min read

Ask Perplexity for the best yacht refitting shipyards in Italy: the first two or three are described with founding year, certifications and specializations, while the others are brushed off with "among the others that also operate..." — almost an invisibility. It's not a matter of who works better: the model has little to say about whoever has few declared data points, and when it has little to say it doesn't recommend you. Filling in the right information in the right places is simpler than it seems — and it can move your name from the bottom of the list to the opening citation.

Ask Perplexity “best yacht refitting shipyards in Italy”. Look at the answer. The two or three companies cited at the top have one thing in common: a rich description. Not a single line, not a generic claim. Founding year, location, type of work (painting, carpentry, systems), size of the vessels handled, awards or recognitions, names of partner shipyards.

The others — the ones the answer brushes off with “among the other shipyards that also operate…” — have skeletal profiles. Three attributes, four at most. And the model, having nothing to say, mentions them in passing and moves on.

This is the difference between an attribute-rich entity and a poor one. In my articles in the series on Entity and Knowledge Graph I explained what an entity is, how it gets disambiguated, how it gets linked to Wikidata. Today I’ll show you the mechanism that decides whether the model cites you in detail or leaves you in the shadows: the number and quality of attributes the AI manages to retrieve about your brand.

What an entity’s attributes are for an AI model

An entity, for an LLM, is not a name. It’s a node with everything attached to it. If we’re talking about a shipyard in La Spezia, the attributes useful to the model are: founder, founding year, exact location, sector (refitting? new construction? both?), types of yachts handled (sail, motor, megayacht), maximum lengths the slipway can take, RINA or Lloyd’s certifications, industry awards, main products/services, partnerships with international designers or brokers.

The more these attributes are present — and repeated consistently across schema markup on the site, Wikidata, LinkedIn profiles, industry directories, press articles — the more the model “knows” about you. And the more it fills the answer with useful information when someone runs a query in your sector.

The principle documented by Knowledge Graph research

In the world of Knowledge Graph research, a recent strand deals with how to automatically enrich the entities of a graph starting from the literature and the web. The paper KARMA by Yuxing Lu et al. (2025) describes a multi-agent framework in which several specialized LLMs read documents, extract new attributes, verify them against each other and then integrate them into the existing graph. The logic stated by the authors is that manual curation is reliable but doesn’t scale, while an automated pipeline with cross-checks manages to balance accuracy, consistency and usability in enriching the graph.

It follows that the large knowledge graphs — the ones that feed the LLMs answering people searching for “yacht refitting shipyard Liguria” — are no longer populated by hand. They are enriched by pipelines that read the web, extract attributes, compare them across multiple sources, and keep only the ones that pass the consistency checks.

The operational consequence for your business is direct: if your brand doesn’t publish attributes in a machine-readable form (schema markup, Wikidata, structured external profiles), these pipelines have nothing to collect about you. And the entity stays skeletal forever, because no one will curate it by hand anymore.

Translated into practice: the AI engine can’t “see you as rich” if the web doesn’t describe you as rich in a consistent way. And consistency across sources is not an accessory, it’s the main selection criterion.

Common mistake

An entity with 3 attributes (name, city, generic sector) produces vague answers: “there’s also Shipyard X in La Spezia”.

Why rich attributes sit upstream of the AI citation

In the previous articles I showed you how the model figures out that you exist thanks to Named Entity Recognition and how it links you to a unique node through Wikidata. Recognition is just the entry door. What happens next — the fact that the AI cites you in a dense and convincing way — depends on how many attributes it has available to “tell your story”.

An entity with 3 attributes (name, city, generic sector) produces vague answers: “there’s also Shipyard X in La Spezia”. An entity with 20 attributes produces dense answers: “Shipyard X, founded in 1987 in La Spezia, specialized in refitting motor yachts up to 50 meters, with RINA certification”. Guess which of the two pushes the user to click.

The same mechanism of embeddings and vector space I described to you is at work here: the more context the model has about the entity, the closer your brand ends up to specific queries (“megayacht refitting Liguria”, “RINA-certified shipyard La Spezia”), not just generic queries.

Pro tip

Organization schema markup on the homepage (foundingDate, founder, address, award, knowsAbout, sameAs fields)

The test you can run in ten minutes

You need concrete proof of how rich your entity is in the AI’s eyes. Do this:

Open Google’s Rich Results Test, paste your homepage URL, look for the “Organization” entry in the output. Count the populated attributes: name, url, logo, foundingDate, founder, address, sameAs, award, knowsAbout, description. If you have fewer than 7, your entity is skeletal.
Go to Wikidata, search for the brand name. If you don’t have an entry, your entity doesn’t exist in the open graph. If you do have one, count the properties (P17 country, P571 inception date, P159 headquarters, P452 industry, P1448 official name, P856 website). Fewer than 10 properties is too few.
Open ChatGPT or Perplexity and ask “tell me about [brand name]”. If the answer is three lines and contains only obvious data (sector, city), you’re at the basic level. If it includes founder, year, specific specializations, partnerships, you’re already above average.

The binary threshold: below 10 verifiable attributes across schema + Wikidata, you’re poor. Above 15, you start to be interesting food for the model.

The reverse engineering I did on the Ligurian shipyards

I tried three different queries on Perplexity, all on the Ligurian nautical sector, to understand what AI engines actually cite when a user searches:

“best yacht refitting shipyards in Italy”
“La Spezia shipyards motor yacht refitting”
“megayacht refit companies Liguria”

Out of nine shipyards cited in total, between Genoa and La Spezia, I checked two things: the Organization record via schema markup on their homepage and the presence of a Wikidata entry.

The pattern I saw: the three shipyards cited at the top of the answers had on average 12-15 attributes filled in across schema and Wikidata, plus company LinkedIn profiles with a rich description and industry directories (Superyachttimes, Boat International) with dedicated pages. The six shipyards cited at the tail — or only as a list — had fewer than 6 attributes, often with no Wikidata entry at all.

Two honest disclaimers: the sample is small and the queries change their answers over time. This isn’t a study, it’s an indicative test. The real analysis is done with professional tools that track AI citations across hundreds of queries in the sector and aggregate the patterns over months. But the direction is clear: those who have attributes get cited in detail. Those who don’t get named and forgotten.

The mistakes I see most often

When I analyze the entity of an Italian SME, I almost always come across the same four patterns.

Attributes only on the site, none on Wikidata and external profiles. The shipyard has a nice “about us” page with founder, history, awards. But none of this exists outside its own domain. The model sees a single source and discards it as self-referential.

Bare-bones Organization schema. Only name, url, logo. No foundingDate, no sameAs, no award. That’s 70% of the cases I see in the nautical and craft sectors: the site is nice, the schema is the bare minimum produced by the default WordPress plugin.

Inconsistency across sources. On the site the location is “La Spezia”, on LinkedIn it’s “Lerici”, on Wikidata (if it exists) it’s “Liguria”. The model receives three different signals and considers the entity unreliable.

Overly generic attributes. “Sector: nautical”. Useless. The model wants “refitting of motor yachts between 30 and 50 meters, specialized in electrical systems and stainless steel carpentry”. Specificity is an attribute in itself.

What to do concretely for your brand

For each key attribute of your brand (founder, founding year, exact location, main products/services, certifications, awards, partnerships) check that it’s present in three places:

Organization schema markup on the homepage (foundingDate, founder, address, award, knowsAbout, sameAs fields)
Wikidata entry with populated properties (if you don’t have one, create it or have it created by someone with editorial experience on Wikidata — it’s not trivial)
Consistent external profiles: company LinkedIn, Google Business Profile, authoritative industry directories (for nautical shipyards: Superyachttimes, Boat International, RINA register)

Compare with the 3-5 shipyards the AI cites first in the queries of your sector. Count how many attributes they have and how many you have. The difference is exactly the distance that separates you from the citation at the top.

Visibility in AI answers goes through here

I told you at the start that the difference between a “fat” entity and a “skeletal” one decides whether the AI cites you in detail or leaves you in the shadows. The principle documented by Knowledge Graph research confirms it: automatic enrichment systems work on what they find online, and if they don’t find consistent attributes about you, they can’t make them up — in fact, they discard contradictory signals.

Visibility in AI answers is not a switch. It’s a progressive accumulation of verifiable attributes that the model gathers about you. Every property you add to Wikidata, every field you populate in the schema, every consistent mention in industry directories, is one more piece the model will use when it talks about your brand.

In the next articles of the series I’ll go into the detail of the complete Organization schema, of how to build an entity-to-entity relationship map, and of how to run a periodic entity audit to keep the profile always rich. These are the operational steps that make concrete what I’ve described to you here.