Exploring 1.2.7 Extended Audio Description (Prerecorded)

Part of the series: Exploring WCAG Level AAA

The short version: When a video’s audio track leaves too little room for audio description, this criterion requires a version of the video that pauses to make space for fuller descriptions.

What does 1.2.7 require?

The official language: “Where pauses in foreground audio are insufficient to allow audio descriptions to convey the sense of the video, extended audio description is provided for all prerecorded video content in synchronized media.”

To understand what this is asking for, you need to know where it sits in the WCAG structure. At Level AA, Success Criterion 1.2.5 requires audio description for prerecorded video. Audio description is a narrated track that describes what’s happening on screen, filling the gaps between dialogue and other audio so that blind and low-vision users can follow the visual content.

Standard audio description works by fitting descriptions into the natural pauses in a video’s audio. When a character finishes speaking and the next one hasn’t started yet, a describer slips in a sentence or two. When the music fades before a scene change, there’s room to say what’s visible.

That works well enough in a lot of videos. But it breaks down in videos where the audio is dense. Think of a fast-paced documentary where the narrator barely stops. Or a training video where an instructor talks through every step. Or a news segment with almost no silence. The pauses that exist might be two seconds long, barely enough to say “she walks to the window,” let alone describe what she’s doing there and why it matters.

Extended audio description solves this by pausing the video, delivering a complete description, then resuming playback. It’s a more disruptive technique, but in dense content, it’s the only way to give blind users enough information to follow what’s happening.

Able Player’s extended audio description demo shows what this looks like in practice. Toggle the description on and watch what happens to the runtime. The described version is longer than the non-described version, not because anything was added to the video, but because the video keeps stopping to make room for narration. That difference in duration is a useful way to grasp what extended audio description does to a piece of content.

This criterion applies to the same category of content as 1.2.5: prerecorded synchronized media, meaning video with audio. It doesn’t apply to audio-only content, live video, or video with no speech or meaningful visual content.

Who does this help, and how?

Extended audio description primarily serves people who are blind or have low vision and rely on audio to understand video content.

For these users, standard audio description at the AA level makes a real difference. But when that description can’t fit within the available pauses, it either gets cut short or left out. The user follows along, hearing dialogue and ambient sound, but missing the visual thread that ties everything together. They might understand what people are saying without knowing where they are, what they’re looking at, or what’s happening around them.

That gap can be minor. In a talking-head interview, most of what matters is the speech. But in many videos it’s significant. A training video for a software tool might show a series of clicks and screen changes while the instructor describes what to do next. If the description can’t keep up with the action, the viewer loses the visual scaffolding that makes the instructions meaningful.

Extended audio description closes that gap. It gives describers the time they need to convey what’s on screen, without compressing everything into two-word fragments.

Why did this land at AAA?

I’m not aware of the working group’s reasoning behind individual placement decisions, so what follows is informed interpretation rather than inside knowledge.

The core tension here is between completeness and disruption.

Standard audio description, which is the AA requirement, is designed to be as unobtrusive as possible. It fits into gaps that already exist in the content. Viewers who don’t need it barely notice it. Extended audio description breaks that model entirely. It stops the video. For sighted viewers watching the same content, that pause is a jarring interruption. For content creators, it raises questions about how to handle the interruption gracefully and how to signal to users what’s happening.

Production is also more complex. Extended audio description isn’t just a matter of writing a better script. It requires identifying every moment where a description won’t fit, scripting a complete description for each, producing a synchronized version of the video that handles the pause and resume correctly, and testing it to make sure the timing holds up. Not every video player supports this. Not every production pipeline is built for it.

The result is a technique that serves a real need but asks a lot from content producers, works best in a relatively narrow set of circumstances (dense audio content where visual context matters), and creates a noticeably different viewing experience. Those factors together explain why WCAG places it at AAA.

That said, there are situations where extended audio description is not optional in any meaningful sense. A training video for a job skill, a government video explaining a legal process, an educational video for a course a student can’t opt out of — for blind users, these videos either work or they don’t. Extended audio description is what makes them work.

Common pitfalls

Assuming 1.2.5 compliance covers 1.2.7. It doesn’t. 1.2.5 requires that audio description is provided, but not that it’s complete. A video can have a description track that satisfies the AA requirement while still leaving out visual content that didn’t fit within the available pauses. 1.2.7 exists precisely to address that gap, requiring a version of the video that pauses to make room for a fuller description when the audio is too dense to allow one.

Assuming all videos need extended audio description. The criterion only triggers when standard audio description is insufficient. Many videos have enough natural pauses that a skilled describer can cover everything without stopping playback. Applying extended audio description to every video adds unnecessary complexity. The question to ask is whether a blind user, relying only on the audio track including the description, would understand what’s happening on screen.

Player support. Not all video players handle extended audio description well. The technique requires the player to pause on cue and resume after the description ends. Some players support this natively. Many don’t. If you’re implementing extended audio description, confirm your player handles it before investing time in production.

Poor description quality. Extended audio description gives describers more time, but more time doesn’t guarantee better output. A long description that buries the most important visual details under irrelevant ones isn’t useful. The same principles that make standard audio description good — objective, prioritized, well-timed, written in plain language — apply here.

No way to turn it off. Extended audio description pauses the video, which can be disorienting for users who don’t need it and aren’t expecting it. Best practice is to offer users a way to switch between the standard version and the extended audio description version, rather than forcing everyone through the same experience.

How to test it

This section is aimed at accessibility professionals performing conformance evaluations.

Start by identifying which videos on the page fall within scope. This criterion applies to prerecorded synchronized media, meaning video with spoken audio. Videos with no speech, audio-only content like podcasts, and background videos with only music or ambient sound fall outside the scope of this criterion and don’t require audio description of any kind.

Does extended audio description apply?

Start by checking whether the video already has standard audio description satisfying 1.2.5. If it doesn’t, the video fails at the AA level and you note that separately.

If standard audio description is present, watch the video with the description track active and ask: does this description actually convey the sense of the video? Pay attention to moments where visual information seems to be missing or compressed into something too brief to be useful. Listen for descriptions that were clearly cut short. Watch for scenes where something significant happens on screen that the description skips or summarizes inadequately.

If the description is complete and the visual content is covered, 1.2.7 doesn’t apply. The standard description is sufficient.

If visual content is missing or inadequately described due to insufficient pauses in the audio, 1.2.7 applies and you move to the next question.

Is extended audio description provided?

Check whether the page offers a version of the video with extended audio description. This might be a separate version of the video, a player that supports description playback with automatic pause and resume, or a link to an alternative.

If no extended audio description version exists, that’s a failure.

Does the extended audio description version work correctly?

Play the extended audio description version and verify the following.

Does the video pause when needed?

When a description is too long to fit a natural gap, the video should pause automatically to deliver the full description, then resume.

Does playback resume cleanly?

After each description, the video should pick up where it left off without skipping or overlapping with the next moment of audio.

Does the description cover what it needs to?

For each moment that required extended description, confirm that the description actually conveys the visual information. This requires a sighted reviewer to watch the original video alongside a blind reviewer or someone who can evaluate the description against the content.

Can users choose between versions?

If the extended audio description version is the only one available, check whether users who don’t want it can opt out. This isn’t a hard requirement for the criterion, but it’s worth noting as a usability issue.

Expected results

Pass

The video’s audio track already conveys all meaningful visual content, making audio description unnecessary.
The video has standard audio description that fully covers all meaningful visual content, making extended audio description unnecessary.
Visual content could not be conveyed within natural pauses, and an extended audio description version is provided that pauses the video to deliver complete descriptions, then resumes cleanly.

Fail

A video contains important visual content but has no audio description, and the audio is dense enough that standard description would be insufficient.
Standard audio description is present but incomplete due to insufficient pauses, and no extended audio description version exists.
An extended audio description version exists but the video doesn’t pause correctly, or descriptions are cut off, or the resume timing is broken.
The extended audio description is present but doesn’t actually address the content that was missing from the standard description.

Automated testing and AI

No automated tools test this criterion. Testing requires human judgment throughout: determining whether audio description is needed, evaluating whether standard description is sufficient when it exists, and verifying that an extended version is complete and accurate when one is provided.

Some automated tools flag missing audio description tracks and can catch failures at 1.2.5. But evaluating description quality, identifying gaps, and assessing whether the visual sense of a video is conveyed through audio alone requires a person who understands both what’s on screen and what the user relying on audio would experience.

AI-assisted tools are making inroads on audio description generation. Some tools can produce draft descriptions from video content, which can help with production. But generating extended audio description involves understanding pacing, scene structure, and what visual information is actually meaningful, which current tools handle unevenly. A human describer and a human reviewer remain essential.