Digital PR and Citation Signals

Press Release as Training Signal: why every word of your release ends up in the AI corpus

Roberto Serra 25 June 2026·~8 min read

Every press release you send out over a wire service gets picked up verbatim by hundreds of sites — and those same sentences enter the data with which ChatGPT and Perplexity learn to describe you. If you write generic releases, you're teaching the AI to present you in a generic way, multiplied across three hundred sources. One wrong word, repeated that many times, becomes your official identity for the models. Knowing how to structure those sentences changes everything the AI will say about you in the coming years.

Don’t think of a press release as news. Think of it as a sentence the AI memorizes forever, repeated 300 times across the corpus. Choose every word as if it were carved in marble.

This is the reframe that shifts everything. If you keep writing press releases with the logic of a press clipping — “let’s get the name out there, maybe they’ll pick us up” — you’re throwing away the most powerful repetition signal a small or mid-sized business can generate in the training corpus of AI models. Let me explain why a wire service doesn’t distribute news but signals, and how to write your next release knowing that every key sentence will be repeated in the training of ChatGPT, Claude and Perplexity.

The thread of this series is always the same: how to show up in AI answers when a user asks the engine who you are and what you do.

What really happens when you hit “distribute” on a wire service

A press release via ANSA, PRNewswire, Business Wire or Adnkronos doesn’t land on one site. It lands on hundreds of sites, with the opening paragraph copied verbatim. Every local outlet, every aggregator, every industry portal publishes the boilerplate without touching it. This is how news syndication has worked for twenty years, and it’s the reason why that text, which to you is “the quarterly release,” is something very different to an AI model.

In the field of research on language model training, the phenomenon of massive repetition within the corpus has been studied precisely. Lee et al. (2022) in “Deduplicating Training Data Makes Language Models Better” show how frequent it is and how much it impacts the models.

“We develop two tools that allow us to deduplicate training datasets, for example removing from C4 a single 61 word English sentence that is repeated over 60,000 times. Deduplication allows us to train models that emit memorized text ten times less frequently and require fewer training steps to achieve the same or better accuracy.”

Lee et al., 2022

Translated: a single 61-word sentence was found repeated over 60,000 times in the C4 dataset, one of the standard corpora used to train models like T5 and its derivatives. This isn’t an exception, it’s the pattern.

It follows that if your press release lands on 200 outlets via wire service, and the positioning sentence is identical across all of them, that sentence becomes a signal repeated hundreds of times in the data the models see during training. It’s no longer news: it’s a repeated impression in the “vocabulary” with which the model learned to talk about your industry.

Why repetition matters more than a single citation

I’ve told you in other articles of this series that visibility in AI answers is built on signals of authority and co-occurrence. Tokenization explains why a single keyword can be split badly; E-E-A-T for AI explains why the domain’s reputation carries weight; the backlink as a citation proxy explains how links are read as trust signals.

The press release works on yet another level: the density of identical text repetition within the corpus.

The key point is the inverse: before deduplication, models emit memorized text ten times more. Memorization is proportional to repetition. And even though dedup techniques have become more aggressive, the later literature shows that a share of duplicates always slips through the filters, and that high-repetition segments in news domains are among the most resilient to cleanup.

Translated into practice for you: if your positioning sentence is copy-pasted across 200 outlets with the same wording, it has a much higher chance of surviving deduplication and ending up in the model’s “statistical brain” than a unique piece of content on your site read by 200 different people.

Common mistake

A positioning sentence that changes every time.

A field observation: 3 educational publishers, 6 months of monitoring

Let me tell you what I saw over the last six months across three mid-sized educational publishers, two based in the Ferrara area and one further west in Emilia. All three distributed a press release via wire service in the same quarter, to announce book series and partnerships with schools. I used it as an informal observatory on visibility in ChatGPT, Claude and Perplexity.

Case A: generic press release, vague positioning (“a leading publisher in the educational landscape”). Six months later, the query “best educational publishers Emilia” on Perplexity: never cited.

Case B: a specific positioning sentence, identical in the boilerplate (“specialized in geography and history textbooks for lower secondary school in the Emilia-Romagna area”). Same query after six months: appears in 2 answers out of 8. On ChatGPT, when asked “Italian educational publishers specialized in geography,” the name shows up.

Case C: no real wire service, only a journalist mailing list. Actual distribution: 12 sites. No change on the AI engines.

Honest limitations: this isn’t a study. It’s a pattern across three brands, with no control group. The confounding variables number in the dozens (parallel backlinks, social mentions, training updates). But the signal is clear: a specific positioning sentence + real wire distribution behaves differently. A serious analysis would require professional AI citation monitoring tools and a larger sample.

Pro tip

Write a single positioning sentence and keep it identical all year long.

The test you can run in 15 minutes on your last release

Take the last press release your company distributed via wire service. Not the release you sent out as PR on your own channels: the one that went out on ANSA, Adnkronos, PRNewswire or equivalents.

Step 1 — Identify the positioning sentence. It’s the line, usually in the closing boilerplate, that describes who you are. Something like “Company X, founded in 1987, is specialized in Y and based in Z.” If you don’t have one, or if it’s different in every release, you already have the problem.

Step 2 — Search for the exact sentence on Google in quotes. Copy the string, paste it into Google with the quotes. Count the results. If they’re fewer than 20, the distribution didn’t work or it wasn’t a real wire service. If they’re 80-300, you’re in the range where the signal genuinely exists.

Step 3 — Test it on the AI engine. Open Perplexity and run an industry query without naming your brand: “best [type of company] [geographic area]” or “who produces [specific product] in [region].” See whether you’re cited or not. Repeat on ChatGPT with search enabled and on Gemini.

Binary threshold: if you never show up across 10 reasonable industry queries, your press release isn’t generating an AI signal, regardless of how many outlets picked it up. Most likely the positioning sentence is too vague, or the wire service you used doesn’t reach the domains the models’ crawlers read.

The mistakes I see most often on press releases

A positioning sentence that changes every time. Every release rewrites the boilerplate. Result: instead of one sentence repeated 200 times, you have 8 sentences repeated 25 times each. The density collapses, the signal scatters.

Vague positioning. “Industry leader,” “a benchmark player,” “a point of reference.” Words that say nothing to an AI model trying to associate entities with specific categories. Better “specialized in history textbooks for secondary school” than “leader in educational publishing.”

Clickbait headline, weak boilerplate. The headline often gets rewritten by the outlets. The boilerplate doesn’t, it gets copied verbatim. Invest 70% of your writing effort in the last 3-4 lines, not in the headline.

Poorly chosen wire services. Not all circuits reach the domains AI models read. Some Italian services distribute to 400 outlets that are all low-quality SEO satellites, often filtered out by the models’ crawlers. Better 80 quality outlets than 400 with poor reputation.

What to do concretely on your next press release

Write a single positioning sentence and keep it identical all year long. Update it once a year, not with every release.
Include city, specialization and vertical industry explicitly: “educational publisher based in Ferrara, specialized in geography textbooks for lower secondary school” is a thousand times better than “Emilian publisher.”
Check the wire service by asking for the list of actual destination domains. If they won’t give it to you, switch providers.
Put structured references in the boilerplate that help author entity recognition and implicit reference weight: founder, year founded, specific category.
Compare your boilerplate with the 3-5 competitors the AI cites in your industry. If yours is vaguer than theirs, you already know what to fix.

AI visibility: the press release as a silent lever

The press release via wire service is one of the few levers a small or mid-sized business has to produce massive repetition of identical text across domains with high journalistic authority. If you use it knowing that every word will be read, tokenized, memorized and perhaps re-emitted when someone asks the AI engine who does your kind of work, it changes everything.

It’s not a magic factor and it isn’t enough on its own: your Google Business profile and your presence on Wikidata need to be consistent, and the inverted pyramid of the content on your site has to hold up to search. But without a structured press release you’re missing one of the few natural ways to get your positioning sentence repeated across hundreds of domains without buying backlinks.

In the next articles of this series I dive into the mechanics of choosing the wire service well, the timing of distributions, and the relationship between press release and event entity speaking authority.