Skip to main content

jQuery Removing Bad Charaters in HTML

By Sam Deering

JavaScript

Share:

Free JavaScript Book!

Write powerful, clean and maintainable JavaScript.

RRP $11.95

I previously wrote about using jQuery to Strip All HTML Tags From a Div. Now if you want to remove all bad character from a HTML string (which may have been provided by a $.getScript() call or such).

This is how you can easily clean up your html and remove bad characters, it could be useful when you get the html from somewhere and you want to .match() for strings but the .match() throws an error because of bad characters. We can do this using regex and still retain our HTML tags like so:

//clean up string/HTML (remove bad chars but keep html tags)
rawData =  rawData.replace(/[<>^a-zA-Z 0-9]+/g,'');

If we wanted to be extra specific we could also remove other common characters which are not needed:

///clean up HTML ready to be used with match() statement
rawData =  rawData.replace(/[^/\"_+-<>=a-zA-Z 0-9]+/g,'');

cleanHTML() Function

I wrote this little function to help with the process of cleaning up the HMTL ready for using regex on it.

/* clean up HTML for use with .match() statement or regex */
var JQUERY4U = {};
JQUERY4U.UTIL = 
{
	cleanUpHTML: function(html) {
		html = html.replace("'",'"');
		html = html.replace(/[^/\"_+-?!<>[]{}()=*.|a-zA-Z 0-9]+/g,'');
		return html;
	}
}
//usage: 
var cleanedHTML = JQUERY4U.UTIL.cleanUpHTML(htmlString);

More Copy and Paste Regex Examples

Related Articles

Sam Deering has 15+ years of programming and website development experience. He was a website consultant at Console, ABC News, Flight Centre, Sapient Nitro, and the QLD Government and runs a tech blog with over 1 million views per month. Currently, Sam is the Founder of Crypto News, Australia.

New books out now!

Learn the basics of programming with the web's most popular language - JavaScript


A practical guide to leading radical innovation and growth.

Integromat Tower Ad