Hello,

I want to create a javascript function that ensures that a string containing html code is valid and well-formed.

This is for a Vista gadget that pulls html content from an rss feed. in order to get approved for inclusion in the Windows Live Gallery, they require that all external input be validated and sanitized. I believe I have the sanitization working fine. But my experience with javascript and particularly regular expressions is a bit limited; and I want to make sure I get it right and don't inappropriately invalidate any input.

Here is an example of the type of function I need...

// returns null on failure
function isValid(str) {
var regexp = /^[\d\-\(\)\s]{6,14}$/gi;
return regexp.exec(str);
}

This function only allows numbers, spaces, brackets, and dashes between 6 and 14 characters long. This function invalidates most html as it does not permit angle brackets, etc.

What I'm looking for is a regular expression that accommodates any well-formed html and any html characters that might be used. Again, I'm concerned about improperly invalidating a character that I didn't anticipate.

Does anyone have any insight as to the best way to accomplish this?

Thanks!