What is the shortest, simplest DOM treewalker?

I am having trouble understanding treewalker codes I have read so far so I ask for the shortest simplest example.

All I want is to change any such string xyzxyzxyz into abcabcabc, anywhere in the document.

document.body.innerHTML = document.body.innerHTML.replaceAll("xyzxyzxyz","abcabcabc");

1 Like

Thanks, and with a DOM treewalker (as a concept)?

Well let’s try it out.

Here’s some HTML content, with a range of sections in different places to replace.

<div>xyzxyzxyz into abcabcabc</div>
<div>
  <p>xyzxyzxyz into abcabcabc</p>
</div>
<div>
  <div>Some other content</div>
  <div>And some xyzxyzxyz in the middle.</div>
</div>

Let’s start walking the tree.

const els = document.body.childNodes;
els.forEach(function (el) {
  console.log(el);
});

That goes through all of the top-level elements of the tree.

To walk through other nested elements, we want to place that code in a function, so that the code can be called again.

Here’s placing that code into a function:

function walkElems(node) {
  const els = node.childNodes;
  els.forEach(function (el) {
    console.log(el);
  });
}
walkElems(document.body);

and now, when an element has children, we can walk the nodes of that too.

function walkElems(node) {
  const els = node.childNodes;
  els.forEach(function (el) {
    console.log(el);
    if (el.childNodes) {
      walkElems(el);
    }
  });
}
walkElems(document.body);

That’s now walking the tree.

But we want to do something. When it’s a text node, we want to replace any content that matches the string.

function walkElems(node) {
  const els = node.childNodes;
  els.forEach(function (el) {
    if (el.childNodes) {
      walkElems(el);
    }
    if (el.nodeType === Node.TEXT_NODE && el.nodeValue.includes("xyzxyzxyz")) {
      el.nodeValue = el.nodeValue.replace(/xyzxyzxyz/g, "abcabcabc");
    }
  });
}
walkElems(document.body);

The regular expression in the replace section is only so that multiple matches in a line are replaced at the same time.

It’s fair to say that you can’t get a tree walker much simpler than that. https://jsfiddle.net/o5tzyu81/3/

You could remove the && el.nodeValue.includes("xyzxyzxyz") part to try and make it simpler, but then it would be replacing every single piece of text even when it didn’t need replacing, which seems kinda wasteful.

1 Like

Thanks a lot, dear Paul.

Why is there a “node” keyword there? Or it’s just a parameter name?

Also, I think I need to understand nodeType and nodeValue properties.

Instead of el.nodeType === Node.TEXT_NODE, I could have used el.nodeType === 3, but the former helps to make it easier to understand.

Here’s a list of the types of nodes: https://developer.mozilla.org/en-US/docs/Web/API/Node/nodeType

I mean to ask, why is it walkElems(node) and not, say, walkElems(elements)?

The only place that walkElems(node) occurs, is in the first line where the function is defined.

function walkElems(node) {
  ...
}

That is a function definition, which is made clear by the line starting with the word function. walkElems is the name of the function, and node is the function parameter.

Whenever that walkElems function is called with an argument, such as the following:

walkElems(document.body);

The walkElems(…) is a call to the walkElems function, and document.body is the argument that the function is called with. Inside of the walkElems function (the one that’s at the top of the code), that document.body argument is received by the function as the function parameter called node.

As to why it’s node and not elements, that’s because the function is only ever called with one node, that being the parent node of whatever it is that we are walking through.

Why is it node and not element then? It would be weird to use element for the walkElems when we also have el for element in the forEach.

Also, the use of childNodes and nodeType and nodeValue clearly tells us that it’s a node we are dealing with.

1 Like

JFTR though, the problem with this approach is that it will void any references to the current document such as event listeners… so you’d better just replace the text content of actual text nodes.

Using the tree walker API you can use a node filter for this:

const walker = document.createTreeWalker(
  document.body, NodeFilter.SHOW_TEXT)

let node

while ((node = walker.nextNode())) {
  node.textContent = node.textContent.replace(
    'xyzxyzxyz', 'abcabcabc')
}
2 Likes

The brief i was given was “this string, anywhere in the document.” :wink:

(And yes, the original post at 5 AM this morning was hyperbolic, and meant to emphasize the point of be careful what you ask for.)

1 Like

The treewalker code by Paul Wilkins is something I manage to understand but the treewalker code by m3g4p0p is something I didn’t manage to understand (the problem is mine, of course), it’s just more complicated for me, although slightly shorter.

A good question might be what is the difference between a string and a text node (JavaScriptwise)?

A node is an Object that holds multiple properties. A “text node” is a Node with a certain type (strictly speaking, what the walker is reacding are Element class objects. Element inherits from its base class, Node, which itself has a base class, EventTarget). Objects in this class chain represent elements of the DOM, and contain properties and methods related to that.

A string is a primitive type within Javascript. It can also be cast as a String object. They are not part of the Element/Node/EventTarget chain, and as such do not have the same properties or methods.

1 Like

Just to expand on that, there is also a filter property

const walker = document.createTreeWalker(
  (
    document.body, 
    NodeFilter.SHOW_TEXT,
    /* filter */
    {
      acceptNode(node) {
        return (node.textContent.includes('xyzxyzxyz'))
          ? NodeFilter.FILTER_ACCEPT
          : NodeFilter.FILTER_SKIP;
      }
    },
    false
  )
)

This would give you an iterator (of sorts) that only goes through nodes that have text content containing ‘xyzxyzxyz’

It could be turned into a more generic function.

const makeWalker = function(type, predicate) {
  return document.createTreeWalker(
    document.body, 
    NodeFilter[`SHOW_${type.toUpperCase()}`],
    (predicate === undefined)
      ? null // null is default and will match everything 
      : {
          acceptNode(node) {
            return (predicate(node) === true)
              ? NodeFilter.FILTER_ACCEPT
              : NodeFilter.FILTER_SKIP;
          }
        },
    false
  )
}

const walker = makeWalker(
  'text', 
  (node) => node.textContent.includes('xyzxyzxyz')
)

The shame is walkers and nodeIterators don’t appear to return an iterable object.

For instance you can’t use a spread operator with them and do [...walker].forEach((node) => dosomething)

I did find a polyfill here for NodeIterators which fixes that using a generator function. I’ve amended it to work with TreeWalkers.

if (typeof TreeWalker.prototype[Symbol.iterator] !== 'function') {
  TreeWalker.prototype[Symbol.iterator] = function* () {
    while (true) {
      const next = this.nextNode();
      if (next === null) break;
      yield next;
    }
  };
}

You could then do something like this

const nodesIterable = makeWalker(
  'text', 
  (node) => node.textContent.includes('xyzxyzxyz')
)

for (let node of nodesIterable) {
  node.textContent = node.textContent.replace('xyzxyzxyz', 'abcabcabc')
}

Sorry to the OP, this is not shortest or simplest and more than likely I would have gone with Paul’s recursive walk the dom approach. I did think it was kind of interesting though.

Here is a codepen of my experimentations.

1 Like

I tried to understand if “node” is a keyword built-in JavaScript language.

@bendqh1,

It can be quite helpful to use console.dir to familiarise yourself with the node tree.

So using the following example

html

<div>
  <p>Some text here</p>
</div>

js

const div = document.querySelector('div');
console.dir(div);

If you then check in your console, you can see a whole list of properties including.

▼ div
  ...
  nodeName: 'DIV'
  nodeType: 1 // element node
  ...
  ▼ firstChildElement: p // paragraph
    ...
    nodeName: 'P'
    nodeType: 1 // element node
    ...
    ▼ firstChild: text
     nodeName: '#text'
     nodeType: 3 // text node
     textContent: 'Some text here'

Here are the different nodeTypes for reference

It’s an interface:

Unlike a keyword, it’s not reserved so you might define your own class Node (say)… if that’s a good idea is another matter though. :-)

Today I learned that there is an actual tree walker built in as a part of JavaScript.

3 Likes