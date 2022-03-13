How to create a list of values with tree walker?

JavaScript
#1

In this webpage there are names of Wikipedia special pages, i.e. pages of the “Special:” namespace.
The names are scattered throughout this long webpage.

I can match the names, download them, and sort them in a list via the shell, this way:

curl https://en.wikipedia.org/wiki/Help:Special_page -s | grep -oP 'Special:\K[a-zA-Z0-9]*' | sort -u > special_page_names

JavaScript

Primarily for the sake of learning and experiment I ask.
Is there a way to save the names to clipboard, similarly filtered (as with grep and sort) via JavaScript tree walker?

const regex = /Special:\K[a-zA-Z0-9]*/
const walker = document.createTreeWalker(
  document.body, 
  NodeFilter.SHOW_TEXT
)
let node;
while ((node = walker.nextNode())) {
    // CODE FOR COPYING SPECIAL PAGE NAMES TO CLIPBOARD COMES HERE
}
#2

Well firstly I would adjust that while loop so that it doesn’t do variable assignment in the condition area.

        let node = walker.nextNode();
        while (node) {
            // CODE FOR COPYING SPECIAL PAGE NAMES TO CLIPBOARD COMES HERE
            ...
            node = walker.nextNode();
        }
}

With the regex, JavaScript doesn’t support using \K. Instead we use capture groups.

        const regex = /Special:([a-zA-Z0-9]*)/

Then I would use that loop for populating an array of special lines.

        const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
        const specialLines = [];
        const regex = /Special:([a-zA-Z0-9]*)/
        let node = walker.nextNode();
        while (node) {
            if (regex.test(node.textContent)) {
                specialLines.push(node.textContent);
            }
            node = walker.nextNode();
        }

Beyond there, we use map to get the capture group of the regex.

        const specialTerms = specialLines.map(function getTerm(line) {
            return line.match(regex)[1];
        });

Then it’s just a matter of copying that specialTerms array to the clipboard, with a similar output to console.log in case writing to the clipboard doesn’t work.

        if (navigator && navigator.clipboard && navigator.clipboard.writeText) {
            navigator.clipboard.writeText(specialTerms.join(" "));
        }
        console.log(specialTerms.join(" "));

Here’s the full code.

        const walker = document.createTreeWalker(document.body, NodeFilter.SHOW_TEXT);
        const specialLines = [];
        const regex = /Special:([a-zA-Z0-9]*)/
        let node = walker.nextNode();
        while (node) {
            if (regex.test(node.textContent)) {
                specialLines.push(node.textContent);
            }
            node = walker.nextNode();
        }
        const specialTerms = specialLines.map(function getTerm(line) {
            return line.match(regex)[1];
        });
        if (navigator && navigator.clipboard && navigator.clipboard.writeText) {
            navigator.clipboard.writeText(specialTerms.join(", "));
        }
        console.log(specialTerms.join(" "));

All of that could be done inside of the while loop, but doing it the way I’ve done above helps to reduce complexity of what’s going on.

