Problem with chrome extension that results in endless loop

Hey there,

I am currently trying to code a Javascript Chrome extension to highlight some words and add their definitions within a tooltip. Everything seems to work except that certain definitions that I want to add also contain words that are supposed to be highlighted on the original page. The result is a kind of loop: some words within the tooltip are highlighted and a new tooltip is created over the first one and so on.

I first mapped all the words I wanted to replace as regular expressions and stored them with the corresponding HTMT to replace it :

span class='tooltip'>the<span class='tooltipcontent'>This is a test to check if this extension works and shows this content as a tooltip.</span></span>

I then used a treewalker to get the nodes I was interested in (the text nodes) and stored them in a list :

function walkDoc(target){
      let treeWalker = document.createTreeWalker(target, NodeFilter.SHOW_TEXT, {acceptNode : function(node){
                          if(node.textContent.length === 0){
                            return NodeFilter.FILTER_SKIP;
                          }
                          return NodeFilter.FILTER_ACCEPT;
      }
    }, false);
    console.log('treewalker created');

    nodeList = [];
    console.log('nodelist created');

     while(treeWalker.nextNode()){
       nodeList.push(treeWalker.currentNode);
     }
     console.log(nodeList);
     i = 1;

     nodeList.forEach(function(n){
       console.log('node '+i);
       replaceNode(n);
       i++;
     });
    }

And to finished I created a replacement function fed with the list of nodes to check if the desired word was in it and to replace it so :

    function replaceNode(node){
  if(node.nodeName !== '#text' || node.parentNode.nodeName === 'SCRIPT' || node.parentNode.nodeName === 'STYLE' || node.parentNode.className == 'tooltip'
     || node.parentNode.className == 'tooltipcontent'){
    console.log('error node type');
    return;
  }

 let content = node.textContent;
 console.log('current content:'+content);

for(let [word, def] of glossary){
  let regex = rgx.get(word);
  content = content.replace(regex, def);
  }

if(content !== node.textContent){
  let newSpan = document.createElement('span');
  newSpan.innerHTML = content;
  node.parentNode.replaceChild(newSpan, node);
  console.log('node replaced');
  }
}

So the problem actually is that I get several tooltip when I should just get one if on word of the list is in it. I totally understand where this comes from but I don’t really know how to avoid it. I have tried to skip the nodes with the class name ‘tooltip’ or ‘tooltipcontent’ but it does not seem to work. I have also tried to skip the nodes which the inner HTML contains the word ‘tooltip’ or ‘tooltipcontent’ but with no result.

I usually try my best to find a solution on my own but this time I feel that I miss something simple and I can’t figure out what. Would anyone be able to help me find a solution on that ?

Thank you

Hi @mattertonn, can you provide some dummy values for glossary and rgx so we can reproduce the issue? BTW you have a couple of implicit globals in your code… that’s probably not causing the bug in question, but should still be avoided (and would throw an error in strict mode).

Hi @m3g4p0p and thank you so much for your time,

Here is values for glossary and their regex equivalent for rgx :

let glossary = new Map();
glossary.set('the'," <span class='tooltip'>the<span class='tooltipcontent'>This is a test to check if this extension works and shows this content as a  tooltip.</span></span> ");
glossary.set('it', " <span class='tooltip'>it<span class='tooltipcontent'>This is a test to check if this extension works and shows this content as a tooltip.</span></span> ");
glossary.set('a', " <span class='tooltip'>la<span class='tooltipcontent'>This is a test to check if this extension works and shows this content as a tooltip.</span></span> ");
glossary.set('to', " <span class='tooltip'>not<span class='tooltipcontent'>This is a test to check if this extension works and shows this content as a tooltip.</span></span> ");

let rgx = new Map();
for(let word of glossary.keys()){
  let r = "\\W+("+word+")\\W+";
  rgx.set(word, new RegExp(r, 'gi'));
}

And also thanks for having spotted the implicit globals, I will fix that. To be frank I was expecting this issue since the beginning but still my lack of experience keeps my away from bypassing it.

Thanks

Thanks! Well the problem is that inside the for ... of loop, you’re potentially updating the content several times but without verifying again that the matched content isn’t child of a tooltip. A possible solution would be to return (or break) from the loop after the first replacement, and then walk the newly created span again:

function replaceContent (node, content) {
  const newSpan = document.createElement('span')
  
  newSpan.innerHTML = content
  node.parentNode.replaceChild(newSpan, node)
  
  return newSpan
}

function replaceNode (node) {
  if (
    node.nodeName !== '#text' ||
    node.parentNode.nodeName === 'SCRIPT' ||
    node.parentNode.nodeName === 'STYLE' ||
    node.parentNode.className === 'tooltip' ||
    node.parentNode.className === 'tooltipcontent'
  ) {
    console.log('error node type')
    return
  }

  console.log('current content:' + node.textContent)

  for (const [word, def] of glossary) {
    const regex = rgx.get(word)
    const newContent = node.textContent.replace(regex, def)

    if (newContent !== node.textContent) {
      const newSpan = replaceContent(node, newContent)
      console.log('node replaced')
      return walkDoc(newSpan)
    }
  }
}

or, considering the regex works on word boundaries (which, incidentally, will cause the script to collapse on any non-English website…), explode the content on the boundaries and instead of doing a regex do a map function to do direct comparison on word-for-word.

2 Likes

Oh yeah didn’t think of that! That looks much nicer then indeed…

function replaceNode (node) {
  if (node.nodeName !== '#text') return

  const container = document.createElement('div')
  const fragment = document.createDocumentFragment()

  container.innerHTML = node.textContent
    .split(/\b/)
    .map(word => glossary.get(word.toLowerCase()) || word)
    .join('')

  fragment.append(...container.childNodes)
  node.parentNode.replaceChild(fragment, node)
}

Didn’t quite get that one though… if you’re referring to non-latin characters, then neither version will work.

1 Like

Javascript’s Regex fires \W on any non-word character, but by definitiion “non-word” is equivilant to [^a-zA-Z0-9_].

Thank you so much guys,

I just tried the first solution and it seems to work as wanted, I would not have thought myself to walk again the new span, this is good.

I will take some time to try and fully comprehend the second one later on the week.

Thank you for your time anyway :slight_smile:

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.