How to delete all HTML elements that includes a certain special character?

I am trying to create a script which allows deleting any word in an entire webpage if that word includes a special character (colon), wherever in the start of the word, somewhere between the start to the end of the word, or in the end of the word).

I have tried this:

document.querySelectorAll("*").forEach( (element)=>{

    if ( element.innerHTML.includes(':') ) {

        element.style.display = 'none';

    }

});

But, it deletes everything in a webpage and not just such words.

Please kindly share with us what’s wrong with my code.

textContent instead innerHTML ends in the same way.

selecting * will pull the body node.
The body node contains all text in the page.
The text contains a colon.
So the body is hidden.

Code’s doing exactly what you told it to do.

1 Like

Use regular expressions like this:

var el = document.body.querySelectorAll("*");
  for(let z=0; z<el.length ; z++){
  el[z].textContent = el[z].textContent.replace(/\s\w*:\w*\s/g," ").replace(/(:\w*\s|\s\w*:)/g,"");
  }

I thank you dearly for exampling a code with regex ; I share with all humbleness that words which includes colons such as recipes:chinese_stir_fried_vegetables still appear.

Thanks for notifying that.
I understand the message behind what your wrote and indeed if instead “body” I put the list container I still get everything in the list deleted (i.e. also things without a colon prefix).

Should I still use include() method here at all?

Okay, thank to helpful comments above I have understood what I actually need to do:

After specifying the area I want to work in (in the case above, the entire body area of the document),

I just need to delete all <li> elements if they have a colon, so all I did for a successful test of the example was to change ("*") to ("li").
(then, by default, all elements in the documents which are <li> with a colon would be deleted).

Hi @bendqh1, yes you can use include() but for a more robust approach you might not iterate over HTML elements (as returned by querySelector()), but over the actual text nodes in the document; and if a given text node includes(':'), hide the parent element. You can use a tree walker for this like so:

const walker = document.createTreeWalker(
  // The root node
  document.body,
  // Only walk text nodes
  NodeFilter.SHOW_TEXT,
  // Further filter to only those text 
  // nodes containing a colon
  {
    acceptNode (node) {
      return node.textContent.includes(':')
        ? NodeFilter.FILTER_ACCEPT
        : NodeFilter.FILTER_REJECT
    }
  }
)

// Hide the parent elements for all these nodes
let node

while ((node = walker.nextNode())) {
  node.parentElement.style.display = 'none'
}

This way you can be assured only to hide the direct parent elements of the text nodes found.

1 Like

The goal posts keep moving! The title of this thread is about deleting elements, the text of the original post is about deleting words and now this thread is about deleting <li> elements :grinning:

My post #4 is for deleting words containing a colon.

I find my code does delete words containing underscrores and one colon. For the regular expression to work as required, words can contain only letters, numbers, underscores and one colon.

Here’s one way of deleting <li> elements containing one or more colons:

var el = document.body.querySelectorAll("LI");
  for(let z=0; z<el.length ; z++){
  if(el[z].innerText.includes(":") ) el[z].parentElement.removeChild(el[z]);
  }

Also need to make sure it’s actually a colon. : is a colon. is not. Can you tell the difference? A regex compiler can.

Try running the line through What Unicode character is this ? (babelstone.co.uk), and see if it is actually a colon or not…

(This often happens with “quotation marks” as well… Macs (and forums, apparently!) in particular are fond of using ‘fancy quotes’ which arent actually the quotation mark symbol.)

1 Like

I’m confused, but if the aim is to delete any word starting with a semicolon within a string:
if(/^;[\a_]+/g.test(str)) {…}
Within the block
let result = str.replace(/^;[\a_]+/g,"");
remove double spaces with
str = result.replace(/ /, " ");
str is now the original string minus words which start with a ;.
If the offending words only occur within paragraph elements then select those P elements with
document.getElementsByTagName(“P”);
Then iterate over the resultant nodeList sending each node to a function containing the above code.
As nodelists are live changing either it’s innerText or value(not sure which) changing it in the function changes it in the document.
Or thinking outside the box, copy the html file and save out as a text file
Then filter that file through the replace commands but add the multiline(m) option along with the global(g) option.
The m and g option is from memory and may be capitalised, so check it yourself.
This info should put you on the right track.
Anything to do with text is easiest accomplished with regular expression.

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.