For example, if a user is allowed to post an article to a wiki, or something like that, how do you ensure the output is well-formed HTML? If there’s a stray unclosed <strong> tag thrown in there it can cause the remainder of the entire page to appear bold. There are some suggestions I found that look decent, but I’d like to hear from the brilliant minds at Sitepoint what they do too!
I think Tidy is the best, and is bundled with PHP5 and greater. You won’t need PECL.
I remember a similar post like this a year or so ago, and HTML Purifier was the group concenius. There was some issue with Tiny, but I am forgetting it. I am assuming you also have strip_tags() in place.
This is not an easy task - to correct malformed html. The way I deal with this is to use tidy and some type of HTML filter class. Some people suggest HTML Purifier, I have never used it. I use HTML_Safe from Pear. Together with Tidy extension is works really well.
Thanks for the suggestions. The HTML allowed is fairly restrictive, but can be nested, which is the main challenge. I wanted something fairly light-weight, so I’m going with a mixture of home-grown code to remove unwanted tags and attributes and HTML fixer to correct the malformed HTML.