Frankly, you shouldn't just ignore the garbage getting pumped into your code from Word. Even today (!) I run into sites that, on my Linux machine, render characters as ?s when there's ZERO reason for that.
Every document created in Windows that I have ever viewed in vi was filled with ^M and "smart quotes" and other crap.
HTMLvalidator may have given you some good idea how to validate the rendered source, but I support Jason's idea of cleaning the garbage out before saving to the DB in the first place if possible. On Windows machines the web page may seem fine. On other machines those same browsers may not bother trying to ignore or change the funky chars.
(btw I think it's nice when a software vendor can help a member out with particular software, thanks for posting HTMLvalidator and welcome to SitePoint. Just be sure not to cross the spam line or the mods will hunt you down and keep a trophy! : )
Metrolyrics claims to send out its pages as UTF-8. This is what I get:
QuiÃ©n dice cuÃ¡l es la bandera que sobre un pedazo de tierra ondea
quiÃ©n decide quiÃ©n tiene el poder de limitar mi caminar dime quiÃ©n
Someone's getting that text from a Windows program, likely.
I hit the back button and find another site with actually readable information.