Entities and Knowledge Graph

When the AI stops understanding that “we” means you: the coreference problem

Roberto Serra 25 June 2026·~9 min read

If your copy uses "we", "our property", "the service" instead of the brand name, the AI loses the thread between your best claims and your name. The longer the page, the higher the risk: you are building authority that doesn't attach to you. The fix is surgical, it's done on the copy you already have — and it puts every merit back where it belongs.

I remember when Google, between 2013 and 2015, really started to understand pronouns. Before that, if you wrote “Villa San Marco is a boutique hotel in Venice. It offers rooms overlooking the lagoon”, to the search engine that “offers” was almost floating in a vacuum. The keyword was “Villa San Marco”, full stop. Everything in the following sentences, if you didn’t repeat the name, became decorative text.

Then came Hummingbird, then BERT, and Google learned to connect pronouns to their subjects. AI does the same thing today, but at a much more advanced level: when ChatGPT or Perplexity read the page of your boutique hotel, they have to understand that “the hotel”, “the property”, “our service”, “they”, “we” are all references to the same entity, namely your brand.

If the connection breaks, the best claims you make about yourself (rooms overlooking the Grand Canal, breakfast with products from the Rialto market, multilingual concierge) don’t get associated with your name. And when a user asks “what is a good boutique hotel overlooking the lagoon in Venice”, the system doesn’t cite you.

This technical problem has a precise name: coreference resolution. In this article I explain what it is, why it sits upstream of all the entity optimization work, and what you can change in your copy tonight so you don’t waste signal.

What a model really sees when it reads “we”

In the world of NLP research, coreference resolution is the task of identifying when two or more expressions in a text refer to the same real-world entity. “Villa San Marco”, “the hotel”, “our property”, “we”, “they” can all point to the same subject, or to different subjects: it depends on the context, on the order of the sentences, on the distance between the pronoun and the last proper noun.

Liu et al. (2025), in the paper Enhancing Coreference Resolution with Pretrained LMs, document that large language models like GPT-3 and PaLM have made enormous progress on few-shot learning tasks, but that handling extended context and resolving references in long texts remain critical pain points. The paper explicitly cites frameworks such as LQCA (Long Question Coreference Adaptation) created to handle references in extended documents, because the problem is not solved.

Translated into practice for you, running a boutique hotel in Venice: when the “About us” page of your site is 1,200 words long and contains the property’s name only in the first paragraph, the AI model reading it has to keep that name anchored to all the “we” and “the property” that follow. The longer the text, the more that thread is stretched and risks snapping.

The operational consequence is simple: the quality of your claims matters less if the connection to your brand name is lost along the way.

Why coreference sits upstream of all the rest of the entity work

In the previous articles of this series I talked to you about Named Entity Recognition, about Entity Disambiguation and about Entity Salience. All mechanisms that start from an implicit assumption: that the AI model manages to anchor every sentence of your text to the right entity.

Coreference resolution is exactly that anchoring. It’s the invisible work the model does before deciding whether you are a relevant entity, before disambiguating “Villa San Marco” from another twenty villas with the same name, before assigning you a high or low salience score.

If you write twenty sentences all with “our property offers…” and “we provide…”, the model has two options: it anchors everything to the brand name mentioned only once at the top of the page (long, fragile thread), or it treats those claims as floating, attributed to a generic subject “boutique hotel in Venice”.

The second option is a disaster for visibility in AI answers. Because when a user asks Perplexity “what is a boutique hotel in Venice with a lagoon view from the terrace”, the system is looking for claims associated with specific entities, not with generic subjects. If your “we have a terrace overlooking the northern lagoon” is not anchored to your brand name, it doesn’t count as yours.

This problem connects in a direct line also to the work on the vector representation of content: the embedding of a sentence with a broken coreferential reference weighs less in the grounding of the answer.

Common mistake

Brand name only in the H1 and in the footer: the entire body flows in the first person plural.

The mistake I see most often on hospitality sites

In the world of Italian tourism, especially in the charming boutique hotels of art cities like Venice, there’s a pattern that repeats almost identically: the “About us” page and the “our philosophy” page are written in the first person plural, warm, evocative, and the brand name appears two or three times in the entire text, almost always as a final signature.

The result, from the point of view of an AI model that has to extract entities and claims:

“We are a Venetian family that has been welcoming travelers for three generations”: the subject “we” is floating, not anchored.
“Our breakfast is made with products from the Rialto market”: “our” has to trace back to the brand name, which is two paragraphs above.
“We offer private gondola tours at sunset”: same problem, and here you lose a very specific claim, potentially citable by the AI when a user asks “hotel in Venice with experiences on the lagoon”.

To a language model, this page is a ribbon of beautiful but poorly attributed claims. The thread between the brand name and the property’s distinctive features is too long, and it snaps.

Pro tip

For service pages, open with a sentence like “[Brand name] + [specific service]”: “Villa San Marco offers a restaurant overlooking the lagoon”, not “Our restaurant overlooks the lagoon”.

What changes if you repeat the brand name

The operational takeaway, derived directly from the principle documented in the paper by Liu et al. (2025), is this: on key pages, repeat the brand name explicitly every 2-3 paragraphs instead of relying on pronouns and paraphrases.

It’s not a matter of old-style SEO keyword stuffing. It’s a matter of reducing the distance between pronoun and antecedent, so that even a model with a limited context window, or a model that processes the page in chunks, keeps seeing the anchor.

Concretely, on the “About us” page of the boutique hotel in Venice:

First sentence of every new section: the full brand name.
Within the section: you can use “the property”, “the hotel”, “we”, without fear.
New section (new H2): you start over from the full name.

This holds even more strongly for pages that describe specific services (the spa, the restaurant, the suite rooms overlooking the Grand Canal, the private tours). Every service must be explicitly anchored to the brand name, otherwise the AI can extract the claim (“spa with treatments based on lagoon algae”) but not connect it to you.

How to check coreference on your pages

An entry-level check you can do in ten minutes, without paid tools:

Open your “About us” page or the page of a key service.
Count the times the full brand name appears and the times a pronoun or paraphrase appears (“we”, “the property”, “the hotel”, “our service”).
If the ratio is below 1 to 5 (one brand name for every six pronominal references), you’re in the danger zone.
Open displaCy ENT, paste one paragraph at a time, and look at how many entities it recognizes. If the brand name doesn’t appear as an entity in a whole paragraph, that paragraph is “floating” for the AI.

It’s an indicative test, not a study. The sample is your page, the tool is public and designed for NER, not for true coreference resolution. But the pattern it gives back is consistent with what AI models see when they process your text. The real analysis requires professional tools and dedicated coreference models.

The mistakes I see most often

Brand name only in the H1 and in the footer: the entire body flows in the first person plural. The AI sees a long sequence of “we” and struggles to anchor it.
Storytelling paragraphs 8-10 lines long without ever repeating the name: typical of the emotional pages of charming hotels. The thread is stretched too far.
Service sections described with “our”: “our restaurant”, “our spa”, “our tours”. The anchor rests entirely on the possessive, which is the most fragile type of reference for a model.
Testimonials and reviews detached from the brand: “guests love our rooftop” without ever naming the property in full. The positive citation doesn’t anchor to the right entity.

What to do on key pages

Operational actions, in order:

Rewrite the H2s so that at least 50% contain the brand name.
Insert the full brand name as the first subject of every new section.
For service pages, open with a sentence like “[Brand name] + [specific service]”: “Villa San Marco offers a restaurant overlooking the lagoon”, not “Our restaurant overlooks the lagoon”.
Compare your page with those of the 3-5 Venetian boutique hotels that the AI cites when you run queries like “boutique hotel in Venice with a terrace on the Grand Canal”: look at their brand name density in the body.
Review the testimonials: every quoted review must have the brand name next to it, not just “we had a wonderful time”.

This work is not a magic factor. It won’t take you from invisible to the first cited client. But it closes a silent gap that can explain months of good content that doesn’t generate mentions in AI answers.

The thread that ties it all together

Coreference resolution is the bridge between your content and your entity in the knowledge graph of AI models. If the bridge holds, every claim you make works for your visibility in AI answers. If the bridge breaks, you produce excellent content that, however, doesn’t attach to your brand.

In the next articles of this series I tackle the work on Entity Salience (how much the model considers you “the subject” of your text), on complete Schema Organization (how to declare your brand in a machine-readable way), and on sameAs as the glue of identity (how to tie your site to the external profiles that confirm who you are).

The connection with author entity recognition that we saw in the authority pillar is direct: if the model doesn’t anchor a pronoun to the right subject in the body, imagine how it fares attributing the authorship of an article to a specific author.