Accessible Audio Descriptions for HTML5 Video

Key Takeaways

Audio descriptions, intended for people who are blind or have impaired vision, traditionally require specialized video-editing equipment to encode into a separate track of a video file. This process often proves impractical for most content producers.
The HTML5 video specification provides an audioTracks object, which theoretically allows for the implementation of an on/off button for audio descriptions and separate control of audio and video volumes. However, browser support for this feature is currently limited.
An alternative solution is to use a MediaController, a feature of HTML5 audio and video that allows for synchronization of multiple sources. This feature is also currently limited in browser support, but it is possible to start two media files at the same time and keep them in sync using existing, widely-implemented features.
The video API provides events such as “play”, “pause”, “ended”, and “timeupdate” that can be used to synchronize audio playback with video events. The “timeupdate” event is particularly crucial for this purpose, firing an average of 3-5 times per second. This approach allows for the creation of accessible audio descriptions without the need for specialized software or separate versions of the video.

A client recently asked me to produce an accessible video player, and one of the features she was very keen to have is audio descriptions. Audio descriptions are intended for people who are blind or have impaired vision, providing additional spoken information to describe important visual details.

Traditionally, audio-described videos have to be made specially, with the audio encoded in a separate track of the single video file. It takes pretty specialised video-editing equipment to encode these audio tracks, and that raises the bar for most content producers beyond a practical level.

All the audio-described content I’ve seen on the web is like this. For example, BBC iPlayer has a selection of such content, but the video player doesn’t give you control over the relative volumes, and you can’t turn the audio-descriptions off — you can only watch separate described or non-described versions of the program.

Enter HTML5

The HTML5 video specification does provide an audioTracks object, which would make it possible to implement an on/off button, and to control the audio and video volumes separately. But its browser support is virtually non-existent — at the time of writing, only IE10 supports this feature.

In any case, what my client wanted was audio descriptions in a separate file, which could be added to a video without needing to create a separate version, and which would be easy to make without specialised software. And of course, it had to work in a decent range of browsers.

So my next thought was to use a MediaController, which is a feature of HTML5 audio and video that allows you to synchronise multiple sources. However browser support for this is equally scant — at the time of writing, only Chrome supports this feature.

But you know — even without that support, it’s clearly not a problem to start two media files at the same time, it’s just a case of keeping them in sync. So can we use existing, widely-implemented features to make that work?

Video Events

The video API provides a number of events we can hook into, that should make it possible to synchronise audio playback with events from the video:

The "play" event (which fires when the video is played).
The "pause" event (which fires when the video is paused).
The "ended" event (which fires when the video ends).
The "timeupdate" event (which fires continually while the video is playing).

It’s the "timeupdate" event that’s really crucial. The frequency at which it fires is not specified, and practise it varies considerably — but as a rough, overall average, it amounts to 3–5 times per second, which is enough for our purposes.

I’ve seen a similar approach being tried to synchronise two video files, but it isn’t particularly successful, because even tiny discrepancies are very obvious. But audio descriptions generally don’t need to be so precisely in sync — a delay of 100ms either way would be acceptable — and playing audio files is far less work for the browser anyway.

So all we need to do is use the video events we have, to lock the audio and video playback together:

When the video is played, play the audio.
When the video is paused, pause the audio.
When the video ends, pause the video and audio together.
When the time updates, set the audio time to match the video time, if they’re different.

After some experimentation, I discovered that the best results are achieved by comparing the time in whole seconds, like this:


if(Math.ceil(audio.currentTime) != Math.ceil(video.currentTime))
{
  audio.currentTime = video.currentTime;
}

This seems counter-intuitive, and initially I had assumed we’d need as much precision as the data provides, but that doesn’t seem to be the case. By testing it using a literal audio copy of the video’s soundtrack (i.e. so the audio and video both produce identical sound), it’s easy to hear when the synchronisation is good or bad. Experimenting on that basis, I got much better synchronisation when rounding the figures, than not.

So here’s the final script. If the browser supports MediaController then we just use that, otherwise we implement manual synchronisation, as described:


var video = document.getElementById('video');
var audio = document.getElementById('audio');
if(typeof(window.MediaController) === 'function')
{
  var controller = new MediaController();
  video.controller = controller;
  audio.controller = controller;
}
else
{
  controller = null;
}
video.volume = 0.8;
audio.volume = 1;
video.addEventListener('play', function()
{
  if(!controller && audio.paused)
  {
    audio.play();
  }
}, false);
video.addEventListener('pause', function()
{
  if(!controller && !audio.paused)
  {
    audio.pause();
  }
}, false);
video.addEventListener('ended', function()
{
  if(controller)
  {
    controller.pause();
  }
  else
  {
    video.pause();
    audio.pause();
  }
}, false);
video.addEventListener('timeupdate', function()
{
  if(!controller && audio.readyState >= 4)
  {
    if(Math.ceil(audio.currentTime) != Math.ceil(video.currentTime))
    {
      audio.currentTime = video.currentTime;
    }
  }
}, false);

Note that the MediaController itself is defined only through scripting, whereas it is possible to define a controller using the static "mediagroup" attribute:


<video mediagroup="foo"> ... </video>
<audio mediagroup="foo"> ... </audio>

If we did that, then it would work without JavaScript in Chrome. It would sync the media sources, but the user would have no control over the audio (including not being able to turn it off), because the browser wouldn’t know what the audio represents. This is the case in which it would be better to have the audio encoded into the video, because then it could appear in the audioTracks object, and the browser could recognise that and be able to provide native controls.

But since we have no audioTracks data, that’s rather a moot point! So if scripting is not available, the audio simply won’t play.

Here’s the final demo, which will work in any recent version of Opera, Firefox, Chrome, Safari, or IE9 or later:

Audio Descriptions Demo

This is just a simple proof-of-concept demo, of course — there’s no initial feature detection, and it only has the basic controls provided by the native "controls" attribute. For a proper implementation it would need custom controls, to provide (among other things) a button to switch the audio on and off, and separate volume sliders. The interface should also be accessible to the keyboard, which is not the case in some browsers’ native controls. And it would need to handle buffering properly — as it is, if you seek past the point where the video has preloaded, the audio will continue to play freely until the video has loaded enough to bring it back into sync.

I might also mention that the descriptions themselves are hardly up to professional standards! That’s my voice you can hear, recorded and converted using Audacity. But such as it is, I think it makes an effective demonstration, of how low the technical barrier-to-entry is with this approach. I didn’t have to edit the video, and I made the audio in an hour with free software.

As a proof of concept, I’d say it was pretty successful — and I’m sure my client will be very pleased!

Frequently Asked Questions (FAQs) about Accessible Audio Descriptions for HTML5 Video

What is the importance of accessible audio descriptions in HTML5 videos?

Accessible audio descriptions play a crucial role in making HTML5 videos more inclusive and user-friendly. They provide an auditory equivalent of the visual information, which is particularly beneficial for visually impaired users. These descriptions narrate important visual details that cannot be understood from the main soundtrack alone. By incorporating accessible audio descriptions, content creators can ensure their videos are accessible to a wider audience, thereby promoting digital inclusivity.

How can I add audio descriptions to my HTML5 video?

Adding audio descriptions to HTML5 videos involves a few steps. First, you need to create a separate audio track that describes the visual elements of the video. This can be done using various audio editing software. Once the audio description track is ready, you can add it to your HTML5 video using the element with the kind attribute set to “descriptions”. This will ensure that the audio description track is recognized and played along with the video.

Why is my HTML5 video not playing?

There could be several reasons why your HTML5 video is not playing. It could be due to a problem with the video file itself, such as it not being properly encoded. It could also be due to issues with the web browser or the video player not supporting the video format. To troubleshoot, try playing the video in a different browser or on a different device. If the problem persists, you may need to check the video file and ensure it is in a format supported by HTML5.

What are the common formats supported by HTML5 video?

HTML5 video supports several common video formats, including MP4, WebM, and Ogg. The MP4 format is widely supported across all major browsers and devices, making it a popular choice for web videos. WebM and Ogg are open-source formats that are also widely supported, although they may not work in all browsers.

How can I fix the “HTML5 video file not found” error?

The “HTML5 video file not found” error typically occurs when the browser cannot locate the video file specified in the source attribute of the

How can I make my HTML5 video responsive?

To make your HTML5 video responsive, you can use CSS to set the width of the video to 100% and the height to auto. This will ensure that the video scales up or down to fit the width of its container, making it responsive to different screen sizes.

Can I add captions or subtitles to my HTML5 video?

Yes, you can add captions or subtitles to your HTML5 video using the element with the kind attribute set to “captions” or “subtitles”. You will need to create a WebVTT file that contains the captions or subtitles, and then reference this file in the src attribute of the element.

How can I control the playback of my HTML5 video?

HTML5 provides several built-in controls for video playback, including play, pause, volume, and fullscreen. These can be enabled by adding the controls attribute to the

Can I embed an HTML5 video on my website?

Yes, you can embed an HTML5 video on your website using the

What are the benefits of using HTML5 for video playback?

HTML5 offers several benefits for video playback. It supports multiple video formats, provides built-in controls for video playback, and allows for the addition of accessible features such as captions and audio descriptions. Moreover, HTML5 videos can be made responsive, ensuring they look good on all devices and screen sizes. Finally, as HTML5 is a web standard, it is supported by all modern web browsers, eliminating the need for additional plugins or software.