The LogValidator from the World Wide Web Consortium (W3C) is a great tool for webmasters and administrators for checking the validity of the documents on web sites, large or small, on a regular basis.
The tool is available via CPAN as a module, W3C::LogValidator, and can also be downloaded and installed manually (see link list below).
LogValidator has several modules and can be run in cron on a schedule of your choice. Modules include checking for HTML, XHTML and CSS validity. Using a review of your web logs, the program can be configured to return the most popular pages on your site, with a list of files found invalid run through the W3C validator(s). Documents can be updated for validity manually or using a tool such as Tidy, also available from the W3C.
For large sites, the LogValidator can help identify possible problems with template engine output if errors continue to return on new pages being published. Additionally, if your site is large and you intend to run all pages through on a regular basis, you can download and install the W3C validators locally on your network and point the LogValidator configuration file to it.
The configuration file for LogValidator is well documented and easy to customize for your needs. Support for outputting results includes to raw files, HTML reports or via email. It is possible to setup the tool and configuration files to scan multiple sites and send reports via email to those sites respective web administrators.
LogValidator Home –> http://www.w3.org/QA/Tools/LogValidator/
LogValidator Manual –> http://www.w3.org/QA/Tools/LogValidator/Manual
LogValidator Sample Configuration File –> http://dev.w3.org/cvsweb/perl/modules/W3C/LogValidator/samples/logprocess.conf?only_with_tag=HEAD