jQuery Removing Bad Characters in HTML

Sam Deering
Share

I previously wrote about using jQuery to Strip All HTML Tags From a Div. Now if you want to remove all bad character from a HTML string (which may have been provided by a $.getScript() call or such).

This is how you can easily clean up your html and remove bad characters, it could be useful when you get the html from somewhere and you want to .match() for strings but the .match() throws an error because of bad characters. We can do this using regex and still retain our HTML tags like so:

//clean up string/HTML (remove bad chars but keep html tags)
rawData =  rawData.replace(/[^a-zA-Z 0-9]+/g,'');

If we wanted to be extra specific we could also remove other common characters which are not needed:

///clean up HTML ready to be used with match() statement
rawData =  rawData.replace(/[^/\"_+-=a-zA-Z 0-9]+/g,'');

cleanHTML() Function

I wrote this little function to help with the process of cleaning up the HMTL ready for using regex on it.

/* clean up HTML for use with .match() statement or regex */
var JQUERY4U = {};
JQUERY4U.UTIL = 
{
	cleanUpHTML: function(html) {
		html = html.replace("'",'"');
		html = html.replace(/[^/\"_+-?![]{}()=*.|a-zA-Z 0-9]+/g,'');
		return html;
	}
}
//usage: 
var cleanedHTML = JQUERY4U.UTIL.cleanUpHTML(htmlString);

More Copy and Paste Regex Examples