Selecting and Modifying Text That are not Unique

I’m trying to select and modify text on my webpage which are not unique and then modify them. For instance suppose I have multiple HTML elements that are siblings of one another and they contain the same text. If I try to select the text in one of them using Regex and then modify it i.e. (wrap it in span tags, replace it with different text, delete it, etc.), Regex will only show the first match that it had found which may or may not be the text I selected.

I did some research into using the Treewalker API for this but the examples I saw weren’t very helpful. Please help suggest the best solution that will take care of situations where the same selected text exists in multiple HTML elements.

Is this similar to what you had in mind?

It could do with a bit of a cleanup/refactoring

const getDuplicates = function() {
  // a map of existing text content
  // e.g. { 'Some other content': true }
  const exists = {}
  
  const isNewLine = function (node) {
    return /^[\n\r]+\s*/.test(node.textContent)
  }
  
  return document.createNodeIterator(
    document.body,
    NodeFilter.SHOW_TEXT,
    (node) => {
      // ignore new line text nodes e.g. '\n '
      if (isNewLine(node)) return NodeFilter.FILTER_SKIP

      // if node with same text content already exists
      // return duplicate
      if (exists[node.textContent]) {
        return NodeFilter.FILTER_ACCEPT
      }
      // first appearance then store in exists
      exists[node.textContent] = true
      return NodeFilter.FILTER_SKIP
    },
    false
  );
}

// get the iterator
const duplicatesIterator = getDuplicates()

let currentNode
let duplicateNodes = []

// iterate and store in an array
while (currentNode = duplicatesIterator.nextNode()) {
  duplicateNodes.push(currentNode)
}

// creates a span around a text element
const createSpan = function(textContent) {
  const span = document.createElement('span')
  
  span.className = 'duplicate'
  span.textContent = textContent
  return span
}

// loop through duplicates and wrap a span
duplicateNodes.forEach(
  (node) => node.replaceWith(createSpan(node.textContent))
)

https://codepen.io/rpg2019/pen/KKRGLGP

While i’m sure rpg’s code works great…

Why are you doing this with Javascript? Feels like we’re missing a piece of the situation/problem statement here…
If it’s your webpage… go into the HTML in your HTML-editor of choice, find the duplicates and… do whatever you were going to do with them?

1 Like

That would be far too easy! :slight_smile:

Glad I did this version too then :banghead:

const walkTheDom = function (node, callback) {
  callback(node)
  node = node.firstChild
  while (node) {
    walkTheDom(node, callback)
    node = node.nextSibling
  }
}

const isValidTextNode = function (node) {
  const isNewLine = /^[\n\r]+\s*/
  return (node.nodeType === 3 && !isNewLine.test(node.textContent))
}

const getDuplicates = function (root = document.body) {
  const exists = {}
  const duplicates = []
  
  walkTheDom(root, (node) => {
    if (!isValidTextNode(node)) return
    
    // if node with same text content already exists
    // return duplicate
    if (exists[node.textContent]) duplicates.push(node)

    // first appearance then store in exists
    exists[node.textContent] = true
  })
  
  return duplicates
}

// creates a span around a text element
const createSpan = function(textContent) {
  const span = document.createElement('span')
  
  span.className = 'duplicate'
  span.textContent = textContent
  return span
}

// loop through duplicates and wrap a span
getDuplicates(document.body).forEach(
  (node) => node.replaceWith(createSpan(node.textContent))
)

Hi all, Thanks for your replies especially you rpg_digital. The reason for creating this functionality is for presentation purposes. Suppose I want to be able to highlight some info on my page on the fly while giving a presentation to a group of people in a video conference, if I can highlight or increase the size of any piece of info (especially dynamically loaded info) that is deemed important to the group it would be very useful as it can draw the attentions of all meeting attendees to those highlighted info.


body.innerHTML = body.innerHTML.replaceAll(new RegExp(window.getSelection().toString(),"ig"),"<span class='emphasize'>$0</span>")

?

Unless your selected text is likely to match a classname or HTML component…

I think there is an issue with replacing body.innerHTML in that you will lose all attached eventListeners on the page.

That and there seems to be some weirdness with ending up with orphaned elements — I need to do some looking into that.

Anyway have taken m_hutley’s getSelection and replaceAll and adapted it.

const walkTheDom = function (node, callback) {
  callback(node)
  node = node.firstChild
  while (node) {
    walkTheDom(node, callback)
    node = node.nextSibling
  }
}

const isValidTextNode = function (node) {
  const isNewLine = /^[\n\r]+\s*/
  return (node.nodeType === 3 && !isNewLine.test(node.textContent))
}

const getTextParentNodes = function (root = document.body) {
  const parents = []
  
  walkTheDom(root, (node) => {
    if (isValidTextNode(node)) parents.push(node.parentElement)
  })
  
  return parents
}


const highlightDuplicates = function (strgToMatch) {
  const parents = getTextParentNodes()
  const strgRx = new RegExp(`\\b${strgToMatch}\\b`, 'ig')

  parents.forEach((parent) => {
    parent.innerHTML = parent.innerHTML.replaceAll(
      strgRx, `<span class="emphasise">${strgToMatch}</span>`
    )
  })
}

const removeHighlights = function (strgToMatch) {
  const spans = document.querySelectorAll('.wrapper span.emphasise')

  spans.forEach((span) => {
    if (span.textContent !== strgToMatch) return
    
    span.replaceWith(document.createTextNode(strgToMatch))
  })
}

const highlightHandler = function () {
  const selection = window.getSelection()
  const selectedString = selection.toString().trim()
  
  if (selectedString === '') return

  const anchorNode = selection.anchorNode
  
  // if selection is wrapped in a span, remove spans for that word/words
  if (anchorNode.parentElement.matches('span.emphasise')) {
    removeHighlights(anchorNode.textContent)
  } else {
    highlightDuplicates(selectedString)
  }
}

const highlightBtn = document.querySelector('#toggle-highlight')
highlightBtn.addEventListener('click', highlightHandler)

The code could do with some refactoring. There’s a question mark in how I am selecting nodes. But for now here is a codepen.

Currently you need to select on whole words to highlight all instances — you can also select multiple words.

The highlight button is a toggle, so if you select on a hightlighted word or part of it, it will remove the span tags for those duplicate words. Does that make sense?

I’m sure it still needs work.

1 Like

This turned into a little project for me.

First off I wanted to match on text selection rather than having to click a button first. I also wanted the selection to expand to full words automatically.

I did try using the built in selection.modify, but in the end opted for using while loops to trim and expand the selections. In particular I wanted selections with leading or trailing spaces to trim to the word rather than grab adjacent words.

This was a project in itself.

Expand selection to full words


I have that gut feeling that this could have been achieved in just a few lines of code.

Then the match duplicates script which imports expand selection

Highlight matching text

  1. Selections work backwards and forwards and include inline elements, spans, italic etc.
  2. Selecting selections that include already highlighted text, removes the inner spans and wraps that selection with a new highlighting span.
  3. Selecting inside highlighted text removes the highlight from all matches
  4. Double clicking is currently disabled as there are a few niggles. Very odd in firefox, if you select inside an inline element, window.getSelection actually selects the previous sibling. If you give that inline element a display of inline-block it selects correctly.

Just a bit of fun.

1 Like

Hi all, thanks for all of your input. I have been swamp with work and did not have time to do much else. I will analyze the code and come back with questions.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.