Removing Useless Nodes From the DOM

For the third article in this series on short-and-sweet functions, I’d like to show you a simple function that I find indispensable, when working with the HTML DOM. The function is called clean(), and its purpose is to remove comments and whitespace-only text nodes. The function takes a single element reference as its argument, and removes all those unwanted nodes from inside it. The function operates directly on the element in question, because objects in JavaScript are passed by reference – meaning that the function receives a reference to the original object, not a copy of it. Here’s the clean() function’s code:

function clean(node)
{
  for(var n = 0; n < node.childNodes.length; n ++)
  {
    var child = node.childNodes[n];
    if
    (
      child.nodeType === 8 
      || 
      (child.nodeType === 3 && !/\S/.test(child.nodeValue))
    )
    {
      node.removeChild(child);
      n --;
    }
    else if(child.nodeType === 1)
    {
      clean(child);
    }
  }
}

So to clean those unwanted nodes from inside the <body> element, you would simply do this:

clean(document.body);

Alternatively, to clean the entire document, you could do this:

clean(document);

Although the usual reference would be an Element node, it could also be another kind of element-containing node, such as a #document. The function is also not restricted to working with HTML, and can operate on any other kind of XML DOM.

Why Clean the DOM

When working with the DOM in JavaScript, we use standard properties like firstChild and nextSibling to get relative node references. Unfortunately, complications can arise when whitespace is present in the DOM, as shown in the following example.

<div>
  <h2>Shopping list</h2>
  <ul>
    <li>Washing-up liquid</li>
    <li>Zinc nails</li>
    <li>Hydrochloric acid</li>
  </ul>
</div>

For most modern browsers (apart from IE8 and earlier), the previous HTML code would result in the following DOM structure.

DIV
#text ("\n\t")
+ H2
| + #text ("Shopping list")
+ #text ("\n\t")
+ UL
| + #text ("\n\t\t")
| + LI
| | + #text ("Washing-up liquid")
| + #text ("\n\t\t")
| + LI
| | + #text ("Zinc nails")
| + #text ("\n\t\t")
| + LI
| | + #text ("Hydrochloric acid")
| + #text ("\n\t")
+ #text ("\n")

The line breaks and tabs inside that tree appear as whitespace #text nodes. So, for example, if we started with a reference to the <h2> element, then h2.nextSibling would not

refer to the <ul> element. Instead, it would refer to the whitespace #text node (the line break and tab) that comes before it. Or, if we started with a reference to the <ul> element, then ul.firstChild would not be the first <li>, it would be the whitespace before it. HTML comments are also nodes, and most browsers also preserve them in the DOM – as they should, because it’s not up to browsers to decide which nodes are important and which are not. But it’s very rare for scripts to actually want the data in comments. It’s far more likely that comments (and intervening whitespace) are unwanted “junk” nodes. There are several ways of dealing with these nodes. For example, by iterating past them:

var ul = h2.nextSibling;
while(ul.nodeType !== 1)
{
  ul = ul.nextSibling;
}

The simplest, most practical approach, is simply to remove them. So that’s what the clean() function does – effectively normalizing the element’s subtree, to create a model that matches our practical use of it, and is the same between browsers. Once the <div> element from the original example is cleaned, the h2.nextSibling and ul.firstChild references will point to the expected elements. The cleaned DOM is shown below.

SECTION
+ H2
| + #text ("Shopping list")
+ UL
| + LI
| | + #text ("Washing-up liquid")
| + LI
| | + #text ("Zinc nails")
| + LI
| | + #text ("Hydrochloric acid")

How The Function Works

The clean() function is recursive – a function that calls itself. Recursion is a very powerful feature, and means that the function can clean a subtree of any size and depth. The key to the recursive behavior is the final condition of the if statement, which is repeated below.

else if(child.nodeType === 1)
{
  clean(child);
}

So, each of the element’s children is passed to clean()

. Then, the children of that child node are passed to clean(). This is continued until all of the descendants are cleaned. Within each invokation of clean(), the function iterates through the element’s childNodes collection, removing any #comment nodes (which have a nodeType of 8), or #text nodes (with a nodeType of 3) whose value is nothing but whitespace. The regular expression is actually an inverse test, looking for nodes which don’t contain non-whitespace characters. The function doesn’t remove all whitespace, of course. Any whitespace that is part of a #text node which also contains non-whitespace text, is preserved. So, the only #text nodes to be affected are those which are only whitespace. Note that the iterator has to query childeNodes.length every time, rather than saving the length in advance, which is usually more efficient. We have do this because we’re removing nodes as we go along, which obviously changes the length of the collection.

Frequently Asked Questions (FAQs) about Removing Useless Nodes from the DOM

What is a DOM Node in JavaScript?

In JavaScript, a Document Object Model (DOM) Node is an interface from which various types of DOM API objects inherit. This allows these various objects to be treated similarly, as they share the same properties and methods. Nodes can be elements, text, comments, and document itself. Each node can have a parent, child, and sibling nodes, forming a tree-like structure known as the DOM tree.

How can I identify a useless node in the DOM?

Identifying a useless node in the DOM can be subjective and depends on the specific requirements of your web application. Generally, a useless node could be an empty text node, an unused element, or a comment node that is no longer needed. You can use various DOM properties and methods such as nodeType, nodeName, and nodeValue to inspect a node and determine if it is useless.

What is the difference between removeChild() and remove() methods in JavaScript?

The removeChild() method is used to remove a child node from a parent node. It requires you to select the parent node first, then call the method with the child node as an argument. On the other hand, the remove() method is called directly on the node you want to remove. It does not require a reference to the parent node. However, it’s important to note that the remove() method is not supported in Internet Explorer.

How can I remove all child nodes from a parent node?

To remove all child nodes from a parent node, you can use a while loop in combination with the removeChild() method. Here’s a simple example:

let parentNode = document.getElementById('parent');
while (parentNode.firstChild) {
parentNode.removeChild(parentNode.firstChild);
}

Can I remove a node without referencing its parent?

Yes, you can remove a node without referencing its parent by using the remove() method. This method is called directly on the node you want to remove. Here’s an example:

let node = document.getElementById('node');
node.remove();

How can I remove a specific text node within a div?

To remove a specific text node within a div, you need to first select the div, then iterate over its child nodes. When you find the text node you want to remove, you can use the removeChild() method. Here’s an example:

let div = document.getElementById('div');
for (let i = 0; i < div.childNodes.length; i++) {
let node = div.childNodes[i];
if (node.nodeType === 3 && node.nodeValue === 'text to remove') {
div.removeChild(node);
}
}

What happens if I try to remove a node that does not exist?

If you try to remove a node that does not exist or has already been removed, JavaScript will throw an error. To avoid this, you can check if the node exists before trying to remove it.

Can I remove an element by its class or id?

Yes, you can remove an element by its class or id using the querySelector() or getElementById() methods to select the element, then calling the remove() method. Here’s an example:

let element = document.querySelector('.class');
element.remove();

How can I remove a node and all its descendants?

To remove a node and all its descendants, you can simply call the remove() method on the node. This will remove the node and all its child nodes.

Can I undo the removal of a node?

Once a node is removed, it cannot be undone directly. However, before removing a node, you can clone it using the cloneNode() method, which creates a copy of the node. If you need to restore the node later, you can append the cloned node back into the DOM.