Children of The DOM

Close node relationships in the DOM have always been problematic, because most interpretations of the DOM include whitespace text-nodes, which scripts don’t usually care about.

It’s right that they should be included, of course, because it’s not up to implementations to decide whether this or that node is important. Nevertheless, whitespace text-nodes are usually not important, they just get in the way, complicating what should be simple relationships like firstChild and nextSibling.

Here’s a simple markup example to demonstrate:

<ul>
  <li>list-item 1</li>
  <li>list-item 2</li>
  <li>list-item 3</li>
</ul>

So the firstChild of that <ul> element is not the first <li> element, it’s the whitespace (i.e. the line-break and tab) between the <ul> and <li> tags. Likewise, the nextSibling of that first list-item is not the second list-item, it’s the whitespace text-node in-between.

The Classic Solution

This is nothing new, and traditionally we’ve had three basic ways of dealing with it. The first is to use a collection-based reference like this:

var item = list.getElementsByTagName('li')[0];

The second approach is to iterate past the unwanted nodes, using a nodeType test to determine when we have the node we want:

var item = list.firstChild;
while(item.nodeType != 1)
{
  item = item.nextSibling;
}

The third and most brute-force solution is simply to remove the unwanted nodes altogether, using a recursive function like this (which also removes comment nodes):

function clean(element)
{
  for(var x = 0; x < element.childNodes.length; x ++)
  {
    var child = element.childNodes[x];
    if(child.nodeType == 8 
      || (child.nodeType == 3 && !/S/.test(child.nodeValue)))
    {
      element.removeChild(element.childNodes[x --]);
    }
    if(child.nodeType == 1)
    {
      clean(child);
    }
  }
}

The Element Traversal Solution

These solutions all work, but there is a much simpler and eaiser way of getting the elements references we want, using a suprisingly little-known set of references defined in DOM3 Element Traversal.

The Element Traversal specification defines four new references, which only relate to element nodes, effectively ignoring all other types:

  • firstElementChild
  • lastElementChild
  • nextElementSibling
  • previousElementSibling

So now we can get those list-item references in a much more straightforward way, and it doesn’t matter how many whitespace text-nodes (or anything else) are in-between:

var item = list.firstElementChild;
var item2 = item.nextElementSibling;

The specification also defines a childElementCount property, which is equivalent to childNodes.length when all non-element nodes are disregarded.

The Real-World Solution?

So can we rely on these properties, will they work in the browsers we code for? The answer is “yes” for the most part. Older versions of IE are the usual story, but for IE9 or later, or any reasonably-recent version of any other major browser, we find that all these properties are supported, and have been for quite a while.

PPK’s DOM Compatibility tables give us the low-down, and show that we don’t need to worry at all about lack of browser support — unless we have to support IE8.

So I guess it’s one of those things, just like selector queries used to be — if older browsers are an issue, then libraries can provide the fallback, or you can continue to use the traditional solutions we’ve always relied on. But if you’re lucky enough not to have to think about those older browsers, then the Element Traversal properties will certainly make life easier.

I could also point out that earlier versions of IE have a different view of the DOM — unlike all other browsers, they don’t include whitespace text-nodes. So at a pinch, you could always do something like this:

function firstChild(element)
{
  //using pre-defined browser variable
  if(isie)
  {
    return element.firstChild;
  }
  return element.firstElementChild;
}

A browser test is appropriate for that, rather than simply testing whether firstElementChild is defined, because lack of support for that property doesn’t necessarily indicate an implementation where whitespace isn’t included. The difference is unique to IE, so it’s IE we’d have to test for.

The Common-Sense DOM

To me, these Element Traversal properties are something a breeze of common-sense in W3C specifications — ratifying in standards the practical view that most of have of the DOM. They’re certainly a lot more approachable than DOM2 Traversal ever was (anyone here using TreeWalker? No, I didn’t think so!). The underlying problem that DOM Traversal tried to solve, is that implementations can’t know which types of node a script will care about, yet it tried to solve this problem by continuing to treat all types of node as equal.

But all nodes are not equal — it’s elements that count — and the Element Traversal specification puts them center-stage.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://dashmedia.com.au Jon

    I’m loving the article titles on SitePoint lately, meh the content’s useful as well.

  • Les

    This is what really made me boil with the DOM and Javascript, was the “unseen” child elements, ie whitespace, etc. Why has the W3C decided (in their collective wisdom?) to pollute the API with additional methods?

    Rather a better idea would be to simply ignore the whitespace, etc at source when traversing the DOM, rather simpler to have the browser strip out all the whitespace, etc when building the webpage in memory.

    Oh, but that would be too easy, wouldn’t it?! Crying out …. loud :(

    • http://www.brothercake.com/ James Edwards

      Just because something is unseen, doesn’t mean it’s unimportant, and it’s not up to browsers to decide. If that whitespace exists in the document, then it should be represented in the DOM.

      For example, I worked on a script recently where I need to refer to the whitespace in a document, in order to build a dynamic representation of the DOM that had accurate line-numbers.

      But methods like these solve the problem — you can get to the content you care about in the simplest way that’s appropriate.

  • https://github.com/buyog/lmnt Ryan

    There is no “lastElementSibling” method… I think the one you intended to include is previousElementSibling. ;)

    The W3C spec (http://www.w3.org/TR/ElementTraversal) also defines a fifth useful method, childElementCount, which gives a child count (like childNodes.length) restricted to nodeType-1 children, but since a plurality of browsers allow you to get that same information with the length property of the non-spec “children” attribute, it’s arguably less useful to know about.

    • http://www.brothercake.com/ James Edwards

      Yeah lastElementSibling was a typo, thanks.

      The children collection was included in the WHATWG DOM specification, and has been implemented by most current browsers, but I wouldn’t care to rely on it unless it becomes part of the formal standard. The problem with non-standard collections like that is that they’re not future safe.

  • http://alexgrande.com Alex Grande

    It turns out there is a thing called “TreeWalker” that will allow you to traverse over the DOM without using recursion and filter the type of DOM element you want returned.

    Here’s the link on MDN: https://developer.mozilla.org/en-US/docs/DOM/document.createTreeWalker

    But that doesn’t say much. Here’s a snippet I wrote after I interviewed at a certain social network and I got points off for not knowing this obscure method! https://github.com/grandecomplex/_snippets/blob/master/treewalker.js

    • http://www.brothercake.com/ James Edwards

      Yeah TreeWalker is pretty obtuse, and I’m not surprised it didn’t catch on. I get the sense that the Element Traversal spec which defines these new element properties is an attempt to solve the same problem that DOM2 Traversal tried to solve with TreeWalker, but in a far more sane and coherent way!

  • wing

    How about like this?
    function getFirstChild(element){

    return element.firstElementChild?element.firstElementChild:element.firstChild;
    }

    • http://www.brothercake.com/ James Edwards

      But how do you know which browsers will need it? Just because a browser doesn’t support firstElementChild, doesn’t mean it has a view of the DOM which doesn’t include whitespace.

    • http://www.brothercake.com/ James Edwards

      Come to think of it, you could combine the iteration technique I mentioned at the start, with a test for firstElementChild, and then you’d have a function which is flexible to either view of the DOM, without needing a browser condition:

      function getFirstChild(element)
      {
        element = element.firstElementChild || element.firstChild;
        while(element.nodeType !== 1)
        {
        	element = element.nextSibling;
        }
        return element;
      }
      
  • http://www.integralist.co.uk/ Integralist
    • http://www.brothercake.com/ James Edwards

      That script is making the same incorrect assumption as we discussed earlier. It assumes that if a browser doesn’t support previousElementSibling, that it therefore has a view of the DOM in which whitespace is not included. Such an assumption would hold up in IE, but nowhere else.