Accessible JavaScript: Beyond the Mouse

    James Edwards

    In my last article for SitePoint, I questioned whether AJAX scripting techniques can be made accessible to screen readers, and discovered that, for the most part, they can’t. It’s disappointing to do that — to point out a problem and not be able to offer any answers. But I really had no choice, because as far as I could tell, there weren’t any concrete solutions to offer. (Although since then other developers have pushed the envelope further; of particular significance is the work that Gez Lemon and Steve Faulkner are doing in this area.)

    But accessibility isn’t always difficult! I’m very sensitive to the fact that it’s seen by many people as a load of problems, when in fact accessibility is merely another design challenge that, in general, is no more difficult or problematic than any other. AJAX is a particularly awkward example. Most of the time, though, providing for accessibility really isn’t that hard.

    You can’t always get what you want; but if you try sometimes, you just might find, you get what you need.

    — Rolling Stones

    In this article, I’d like to provide a little gratification to those attempting to make their web applications accessible. To achieve this, I’ll talk about some of the more basic, solvable issues relating to JavaScript accessibility, as we take an introduction to device-independent scripting.

    Keyboard Navigation?

    Most of us use a mouse for the majority of our graphic interface navigation, but some people can’t, and must therefore navigate using the keyboard instead. For a person who has a hand tremor, for example, the precision control required to use a mouse effectively may simply be impossible. For users of assistive technologies such as screen readers, the keyboard is the primary method of interaction. After all, it’s rather difficult to use a mouse when you can’t see the pointer!

    Providing for keyboard access also creates better usability, because many people who can use a mouse nonetheless prefer to use a keyboard for certain tasks or at certain times. These tend to be power users — people who are generally more familiar with how their computers work, and expect to be able to interact with functionality using either the mouse or the keyboard as their needs dictate.

    If you’re not in the habit of navigating sites with the keyboard, try it now! Spend some time on your own site, and on other sites you visit regularly, to get a feel for what it’s like to surf without a mouse. Discover where difficulties arise, and think about how those issues could be avoided.

    Device Independence!

    Referring to "keyboard" access is ever-so-slightly misleading, because it’s not just the keyboard we’re talking about per se. We’re talking about trying to provide for device independence, so that whatever a user’s mode of interaction, they’re able to use a script.

    Mouse events, for example, may not be generated by a mouse at all. They might arise from the movement of a trackball, or the analog stick on a handheld gaming console. Focus events might be generated by a keyboard user navigating with the Tab key, or as the result of navigation commands spoken by an Opera user making use of the browser’s voice control functionality.

    In theory, we would like to be able to support any mode of interaction, regardless of the input device. But in practice, all these forms of interaction generally boil down to one of two basic types: "mouse" (clicking on, or moving an interface element) and "keyboard" (providing input or instructions via character input). These deal with two fairly discreet subsets of the events exposed by the browser, ignoring the majority of programmatic events (loading, errors, etc).

    Three Pillars

    I’m going to assume that you’re already quite familiar with scripting for mouse events, and look only at scripting for keyboard events. (If you need an introduction to events, and a detailed coverage of the real-world use of modern JavaScript techniques, you might like to check out my book.) To that end, there are three core things that I want to discuss — three "pillars" you might say — that together provide a foundation for device independence:

    1. Provide accessible interactive elements.
    2. Choose appropriate trigger elements.
    3. Aim to pair scripting hooks, not event hooks. These terms may not make sense now, but will by the time you finish reading this article.

    I’d also like you to bear in mind, as we go through these points, that catering to accessibility is about providing equivalence, which is not the same as equality. It doesn’t necessarily matter if we provide different paths for different users, so long as everyone has a path to an equivalent end result.
    When we look at some practical examples later on, we’ll see how even radically different approaches can result in solid equivalence overall.

    Providing Accessible Interactive Elements

    First and foremost, if we want to capture input from the keyboard, we’ll need to use elements that can accept the focus: primarily links (<a>) and form controls (<input>, <select>, <textarea> and <button>). Note that it’s also possible to assign focus to the <area> elements in an image-map, a <frame> or <iframe>, in some cases an <object> (depending on what type of data it embeds), and in most browsers, the document or documentElement itself.

    The only events we can handle for these interactions are events that the keyboard can actually generate: primarily focus, blur (triggered when the currently focused element looses focus), click (activating a link or button with the keyboard is programmatically the same as clicking it with a mouse), and the three key-action events, keydown, keyup and keypress.

    In addition to these direct input events, we can use programmatic events — that is, events that fire indirectly in response to state changes. Examples of programmatic events include the infamous window.onload event and the onreadystatechange event of an XMLHttpRequest object.

    We can also use events that are mode independent, i.e. those for which the user’s mode of interaction doesn’t have any effect on how or when they fire, such as the submit event of a form.

    However — and this is a significant caveat — that doesn’t mean we have to consign mouse-specific events to the trash, nor relegate non-focusable elements to the sidelines altogether. It just means we’ll have to rethink our approach to some tasks. Remember, it’s about equivalence, not equality. All paths are good, so long as every user can access at least one of them.

    Choosing Appropriate Trigger Elements

    I’m using the term "trigger element" to refer to any element that’s used to trigger a behavioral response. A trigger element is something a user interacts with in order to cause something else to happen. It could be a simple link to "Add a tag" to a photo on flickr:


    Or it could comprise a series of icons at the top a photo, designed to allow users to perform actions like adding a photo to their favorites:


    But as we’ve already noted, the choice of elements we have available to implement these triggers is limited.

    Now, the <button> element is a particular favourite of mine because it’s so amazingly flexible: it can be styled as much as any other element, it can contain other HTML, it can be enabled or disabled and report that state to user-agents, and it can work as an active trigger element without having a value. However, like all <form> elements, its only valid context is inside a <form>.

    By contrast, the problem with using links as triggers is that although you can have them appear any way you like, they always have to have a value of some kind: a link with nothing in its href attribute is not accessible to the keyboard.

    The generally accepted best practice is to use progressive enhancement — include a default href attribute that points to equivalent, non-scripted functionality — but that’s not necessarily appropriate when we’re working in an entirely scripted environment (for example, in dealing with a link which itself was generated with scripting, in an application that caters to non-script users elsewhere). This situation often results in the need for links to have "#" or "javascript:void(null)", or a similar — essentially junk — href.

    All of this is somewhat beside the point, though, as our choice of element should be based on what the trigger actually is, and on what it does. We can’t just use a <button> for convenience, and to avoid the problem with links, or vice versa. We have to consider semantics, and try to make sure that a trigger element is what it appears to be, and that its appearance is consistent with its function.

    This is not always easy; the flickr icons example is a particularly tricky one. Let’s look at that again:


    The overall appearance of these icons suggests that they’re buttons, like the toolbar buttons in Photoshop or MS Office. But functionally speaking, the first three are scripted actions, while the last one is actually a link to another page.

    So, should the first three be <button> elements while the last is an <a>? Maybe "all sizes" should be a separate link that’s not part of this toolbar at all?

    What about the "Add a tag" link?


    Shouldn’t that be — and look like — a button, since it’s a scripted action, not a page view? (And, while we’re at it, shouldn’t it do something if JavaScript is not available …?)

    Perhaps the overall conclusion in this case is that flickr’s interface design, like so much of the Web 2.0 genre, is just a little haphazard and not properly thought through.

    But all of this really does matter — semantics aren’t just an exercise in navel-gazing. The choice of elements matters a great deal to user agents, as they depend on markup semantics to identify what the content is, which in turn, matters to ordinary users hoping to use that content effectively.

    In case you still feel that this is nothing more than an academic discussion of semantic purity, let’s look at a practical example of why trigger element choice matters in the real world: Opera’s keyboard navigation.

    Opera uses different keys for navigating form elements than it does for navigating links (form elements use the Tab key, while link navigation uses "A" and "Q" for "next." and "previous anchor" respectively). So if we use interface elements that look like buttons for links, or vice versa, we’ll create a cognitive and usability problem for Opera users who navigate using the keyboard.
    As another example, let’s examine what Basecamp does in its Writeboard application:


    "Edit this page" looks like a button, so we should be able to Tab to it just like any other; but we can’t, because it isn’t a button at all. It’s a styled link.

    Perhaps it should be a <button> after all, since that’s what it looks like. Or should it just be (and look like) a simple link, since what it actually does is load a whole new page? In this case, I think the latter.

    Like I said, this aspect is not always easy, but it has to be considered if an application is to be as intuitive with the keyboard as it is with the mouse. In general, I think that links should be used for actions that load a new page without posting any data (i.e. GET requests), and that buttons or other appropriate form widgets should be used for everything else. (What is an application, after all, other than a complex form?). This view is echoed by the HTTP 1.1 specification, which states that GET requests should not be used for actions that will change a resource, such as deleting, creating, or updating content.

    But in all cases, a trigger element must look like what it is.

    Looking for Behavioral Pairing, not Event Pairing

    The HTML Techniques for WCAG 1.0 suggest that a good approach to catering for device independence is to provide redundant input events — two handlers for the same element that "pair" together. The examples it gives include pairing keydown with mousedown, or using keyup to go with mouseup.

    However, this is the wrong way of looking at the issue of providing device independence, because keyboard and mouse events are conceptually different things, and in many cases, behave completely differently. We’ll see this difference in a moment, in the first of our practical example.

    I think it’s more helpful to think in terms of behavioral pairing, rather than event pairing. If you have a piece of functionality that’s driven by, say, a mousedown event, don’t think, "How can I use a keydown event to make this work?" Simply think, "How can I make this work from the keyboard?"

    Am I splitting hairs? I don’t think so. When it’s thought of in this way, the question leads to different answers. The first question asks about a specific approach, which may or may not turn out to work; the second question simply asks if there is an approach; it’s open to any compatible solution. In the last of our practical examples — Drag ‘n’ Drop — we’ll see just how dramatic that difference in thinking can be.

    Some Practical Examples

    Let’s look at some practical examples. I’m not going to delve too deeply into code here. This is just a basic review of some different types of scripting as they’re implemented for the mouse; we’ll also give some thought to how we might implement them for the keyboard.

    Simple Rollovers and Revealing Content

    A simple rollover effect might consist of a color or background-image change on a link. You’re probably more than familiar with links that have block display applied, along with :hover and :focus pseudo-classes, so that they can have background swaps without the need for JavaScript.

    Scripted rollovers are generally just as easily extended to the keyboard, providing that they use proper links or other focusable elements (not just plain text content elements, like a <span> or <td>). In our first example, we’ll add a simple effect to a single element, triggered by toggling a class name (using a hypothetical addEvent function, for example; substitute this when you apply the code in your own work — you can choose your favorite):

    addEvent(link, 'mouseover', function()  
     link.className = 'rollover';  
    addEvent(link, 'mouseout', function()  
     link.className = '';  

    We can simply add a pair of focus and blur handlers to do the same job for people navigating via keyboard:

    addEvent(link, 'focus', function()  
     link.className = 'rollover';  
    addEvent(link, 'blur', function()  
     link.className = '';  

    When it comes to handling events on groups of elements, the situation is more complicated, due to the fact focus events don’t bubble. An event bubble occurs when an element passes the event it triggers up to its parent element. While we could handle a mouse event on any element using a single document-level listener (a technique that’s sometimes known as event delegation), we can’t do the same for events that don’t bubble:

    addEvent(document, 'mouseover', function(e)  
     var target = typeof != 'undefined'  
         ? : e.srcElement;  
     //"target" is whatever node the event bubbles up from  

    This approach works because mouse events bubble up from the point at which they happen; however, since focus events don’t bubble, such a function would only handle events that occur on the document node.

    If we wanted to capture events on each of a group of elements, we’d have to iterate through the elements and bind a listener to each one individually:

    var links = list.getElementsByTagName('a');  
    for(var i=0; i<links.length; i++)  
     addEvent(links[i], 'focus', function()  
       //and so on ...      

    Bear in mind that the exact translation of mouse to keyboard behaviors is not necessarily appropriate, because the usability concerns are often very different between these two kinds of behaviors. Consider the open and close timers in a DHTML menu; these are necessary for the mouse, but undesirable for the keyboard. After all, it’s not possible for users to "slip off the edge" of the menu when navigating with their keyboards, so all the timers offer is useless pauses to the menu’s actions.

    AJAX and Other RPC Scripting

    The core of AJAX scripting deals with programmatic events, such as the onreadystatechange event of an XMLHttpRequest object, or the load event of an iframe that’s being used for data retrieval. The user’s mode of interaction doesn’t affect the behavior of these events, so we don’t need to consider each mode of interaction specially.

    However, we do have two important points to consider.

    Firstly, and most obviously, we should consider how those processes are triggered in the first place. If a request or process is to be initiated by a user action, we must ensure that the action can be triggered by keyboard users. The solution is simply a matter of choosing an appropriate trigger element, as we’ve already discussed.

    The second issue requires the careful construction of response HTML, to ensure that we maintain a useable tab order. If we create new content in response to a user action, and that new content is itself interactive, we must ensure that it’s inserted at a logical point in the HTML.

    For example, say we have a User Preferences form in which users specify their personal details. In this case, they must provide country of origin information:

    <label for="country" id="country-selector">  
     <span>Country: </span>  
     <select id="country">  
       <option value=""></option>  
       <option value="uk">UK</option>  
       <option value="au">Australia</option>  
    <input type="button" value="Save details" id="save-button" />

    We could attach to the select element an onchange event listener that runs code to create a secondary select that allows users to choose a county or state as appropriate. However, we’d want that secondary select to be accessible to the keyboard user immediately, so we should insert it in the correct place — after the first label, before the button:

    var button = document.getElementById('save-button');  
    button.parentNode.insertBefore(newselect, button);

    This example assumes that the new selector and label has already been created, and saved to the object reference newselect.

    Drag ‘n’ Drop

    Drag ‘n’ Drop functionality requires complicated scripting at the best of times, whether or not you’re trying to make it accessible! At first glance, the task of making this functionality accessible looks impossible, because the dynamo of drag ‘n’ drop is the mousemove event, for which there is no keyboard equivalent. But with a bit of lateral thinking, it can be done!

    Imagine our application contains a vertical list or column of boxes that users can drag ‘n’ drop to re-order. The user’s mouse picks up an object, moves it, then snaps it to a new position; the end result of the actions is simply a change in the order of the objects — the one the user dragged has moved up or down by x number of spaces. Couldn’t we achieve that same outcome using commands generated by the up and down arrow keys?

    Indeed we could, but to do so, we’d need a trigger element for the keyboard: a focusable element (either the dragable object itself, or something inside it) that can handle events from the arrow keys.

    In the image below, you can see a box that indicates mouse behaviors. The darker strip at the top is the trigger element for the mouse. Users click on this area and move their mice in order to drag the box around; hence, the principle active event for this behavior is mousemove:


    Now if we add a link or button inside the dragable element, and style it to look like a graphical icon, that icon can be used the trigger element for the keyboard. Given this line of reasoning, the principle active event for the behavior is keypress:


    From this example, we can see the futility of event pairing. There is very little functional similarity between mousemove and keypress events, yet those were the two events we needed to provide for mouse and keyboard users. The conceptual journey we stepped through in order to make this functionality work for the keyboard showed how we can achieve the ultimate goal — equivalent functionality. The details of implementation are just that — details.
    These pictures are taken from an actual script, which is too large to reproduce here, but if you’d like to download and play with it you can find it on my web site.

    Accessibility is not a Feature

    In my imagination, there is no complication.

    — Kylie Minogue

    Designing for accessibility is like building the foundations of a house — easy if you do it from the start, but very difficult to hack in afterwards.

    Clearly the best approach is to consider accessibility right from the project’s initiation — to recognize that accessibility is a design consideration, not a feature. Indeed, Joe Clark’s evaluation of Basecamp’s accessibility makes the point that if you look at accessibility as a feature, you’ll probably just leave it out. "Most developers are gonna leave it out anyway; most developers don’t know the first thing about accessibility or even that it’s important." That’s skeptical, sure, but nonetheless, it’s true.

    With that quote in mind, I’d like to finish by giving you an example of something cool and inspirational, something that really does exemplify best practice in this area. It isn’t new (it’s more than a year old, having been developed and presented by Derek Featherstone at Web Essentials 2005), but its sheer grace and simplicity still bowl me over: it’s the Semantic, Accessible Crossword Puzzle.

    We can’t all be as talented as Derek! But on a practical, everyday level, I hope I’ve begun to demonstrate that device-independent scripting really isn’t that difficult or complex. It may be different from the way we’re used to working, but all it really takes is a little extra thought.