Content Structure for AI

Your videos have no chapters? The AI can’t cite the right part

Do you have long videos without chapters? To the AI they are indistinct blocks it can't use: it can't isolate the relevant part and in most cases it ignores them. Every valuable answer you gave in those videos produces zero citations. Adding chapters with descriptive titles takes five minutes per video and turns each section into an independent piece of content the AI can find and use — without redoing anything.

You have a 45-minute video where you explain everything a potential client should know about your service. You published it on YouTube, embedded it on your site, shared it on social media. And when someone asks the AI engine “how does service X work in industry Y”, that video doesn’t show up. Not even a fragment.

The reason isn’t that the content lacks value. The reason is that, to the retrieval system, that video is a monolithic block — a title, a description, maybe an automatic transcript. It’s a monolith and it isn’t citable. Not in the sense that the AI ignores it entirely, but in the sense that it has no handholds to extract the relevant part. If the answer to the user’s query is at minute 23, but the system has no way of knowing that minute 23 covers that specific topic, the entire piece of content gets treated as background noise.

The solution exists and is within everyone’s reach: video chapters with timestamps. Not the decorative ones you occasionally see in YouTube descriptions. Chapters with descriptive titles, paired with a segmented transcript, that turn a long video into a collection of standalone, citable chunks.

Why a video without chapters is content that can’t be broken up

The principle is the same that applies to any long piece of text content, and that in the world of RAG systems research is framed as a question of granularity:

“Choosing the appropriate retrieval granularity during inference can be a simple and effective strategy to improve the retrieval and downstream task performance of dense retrievers.”

Gao et al., 2024

The granularity of the retrieved content radically changes retrieval performance. In plain terms: if the system can choose to retrieve a 300-token block focused on a specific topic instead of a 5,000-token block that covers everything, the precision of the answer improves. And with precision comes a higher probability that your content gets selected and cited.

A video without chapters, from a retrieval standpoint, is like a web page without headings: a wall of text where the system doesn’t know where one topic begins and another ends. YouTube’s automatic transcript produces exactly this — a continuous stream of words with no semantic breakpoints. Even if the crawler indexes it, that monolithic text competes poorly against rival content that is already segmented into precise blocks.

How timestamps become metadata for retrieval

When you add chapters with timestamps to a YouTube video description, you’re doing more than improving navigation for the user. You’re creating structured metadata that indexing systems can read and associate with specific portions of the content.

The survey by Gao et al. states it explicitly:

“Chunks can be enriched with metadata information such as page number, file name, author, category timestamp.”Gao et al., 2024

Chunks get enriched with metadata — and the timestamp is one of them. It’s not a marginal technical detail. A video chapter with a timestamp and a descriptive title is a chunk with three fundamental properties: a defined beginning and end (the timestamps), a semantic label (the chapter title) and specific content (the corresponding portion of transcript). These are the same properties that make a section with a well-written heading a high-value chunk on a web page.

The difference is that most websites already have headings on their pages, however much they could be improved. Most videos, on the other hand, have no chapters. This means the gap between those who use them and those who don’t is enormous — and the competitive advantage for whoever acts first is proportional to that gap.

Common mistake

Adding chapters isn’t enough if the titles are generic.

The chapter title is your video heading

Here lies the step many people skip. Adding chapters isn’t enough if the titles are generic. “Introduction”, “Part 2”, “Conclusions” are the video equivalent of those “Learn more” and “Find out more” headings I told you about in the article on title hierarchy in web pages — the AI reads them and finds no information about what that section contains.

The chapter title has to work like a query that the segment answers. Not “Our method” but “How the predictive analytics method works for the retail sector”. Not “Case study” but “How client X cut operating costs by 30% in 6 months”. The principle is identical to that of self-contained sections: the title tells the system what is being discussed, and the system uses that information to decide whether that block is relevant to the user’s query.

I analyzed 25 YouTube channels of Italian B2B companies a few weeks ago, before writing this article. Of those that published videos over 15 minutes long, only 3 used chapters. And of those 3, only one had truly descriptive titles. The other two had variations of “Part 1, Part 2, Part 3”. The field is practically empty.

Pro tip

The chapter title has to work like a query that the segment answers.

Segmented transcript: the piece that closes the loop

Chapters alone create the structure. But structure without text content doesn’t generate citable chunks. I covered this in the article on video and podcast transcripts — text is the only currency the retrieval system knows how to spend. The video stays invisible until it gets converted into text.

The difference between a monolithic transcript and a transcript segmented by chapters is the same difference between a page without headings and a page with descriptive headings. The monolithic transcript is a single block. The segmented transcript is a series of mini-articles, each tied to a chapter, each with its own topic, each citable independently.

In practice this means taking one step beyond simple automatic transcription. Take the text generated by YouTube or by your transcription service, cut it at the points corresponding to the chapters, clean up each segment by removing filler words and repetitions, and publish it all on the page with headings that mirror the chapter titles. At that point you’ve turned a 45-minute video into 8-10 standalone sections, each with its own descriptive heading, each with 300-500 words of focused content.

Why the timestamp adds an extra signal

There’s an aspect of timestamps that goes beyond simple segmentation. Timestamps carry temporal information that the more advanced retrieval systems can exploit:

“Assigning different weights to document timestamps during retrieval can achieve time-aware RAG, ensuring the freshness of knowledge and avoiding outdated information.”

Gao et al., 2024

Timestamps allow the system to weight content based on its temporal placement, ensuring freshness and avoiding outdated information. Applied to video: a chapter with a timestamp and publication date gives the system a signal of when that content was created. In a sector where information changes rapidly, this signal can make the difference between being cited and being discarded in favor of more recent content.

The beauty is that every time you update a video or publish a new one with the same updated thematic chapters, you’re telling the system: “this is the most recent version of my answer on this topic”. It’s a mechanism that the static text of a blog post doesn’t have — a video with updated chapters combines content freshness with precise segmentation.

What to do with your next videos

Take the next video you publish — or the most recent one if you don’t have any coming up soon. Look at the content and identify the 5-8 moments where the topic changes. For each one, write a title that is a specific answer or question, not a generic label. Then add the timestamps in the YouTube description in the format that activates the platform’s native chapters (00:00 for the first, then each change).

Do the same with the transcript: segment it at the chapters, clean up each block, and publish it on the page of your site where the video is embedded. Each section with its own heading that mirrors the chapter title. As I explained when discussing infographics with parallel text and informative captions, every non-textual element needs its anchor in text. For videos, that anchor is the transcript segmented by chapters.

This is a first step you can take on your own. For a systematic strategy — optimizing chapter titles for retrieval, VideoObject schema markup with the segments, integration with the overall structure of the site — you need a big-picture view and tools that analyze how your video content is actually processed by AI crawlers. But even with descriptive chapters and a segmented transcript you’re turning invisible content into a collection of chunks that the AI can find, evaluate and cite individually.

And since almost no one does it, the advantage goes entirely to whoever starts now.

Chapter 3 · Content Structure for AI

Continue with the deep dives

39 deep dives across the 5 sections of the chapter.

3.1 Answer Patterns 8 deep dives
3.2 Citable Formats 7 deep dives
3.3 Linking & Semantic Context 8 deep dives
3.4 Multimodal Content 8 deep dives
3.5 Page Architecture 8 deep dives
The author
Roberto Serra at the Senate of the Republic Senate of the Republic · Palazzo Giustiniani Conference “The power of artificial intelligence”
Roberto Serra Roberto Serra

SEO consultant for over 15 years, founder of the Serra SEO Agency (RAANK). He helps multinationals and SMEs stay visible where search is moving: ChatGPT, Perplexity, Gemini and Google's AI Overviews.

As featured in
ANSA Il Sole 24 Ore Le Iene Università di Cagliari La Repubblica
How visible is your brand to AI? Analyze your brand