Opera: Just 4.13% of Web’s Code is Valid
A new study from Opera finds that the overwhelming majority of web sites don’t adequately support web standards. The good news is that compared to previous studies, more web sites are valid today than they were in previous years. The bad news is that just 4.13% of the URLs included in the studies sample size — which was over 3.5 million web pages — passed the W3C validator.
The results came from Opera’s “Metadata Analysis and Mining Application” (MAMA), a search engine that “indexes the markup, style, scripting and the technology used while creating Web pages.” Opera engineers and data miners can then ask the search engine questions like, “how many sites use CSS?” (the answer is 80.4%) or, “how many markup errors does the average site have?” (it’s 47).
Opera is vague about what they’ll do with this tool, other than continue to analyze the data and post additional finding, but a press release hints at the possibility of making it publicly available to web developers.
“MAMA will help Web developers find examples of usage of features and functions, look at trends and gather data to justify technology to their clients or managers,” wrote Opera. “This will also encourage standards bodies to take into account developers’ suggestions about what is happening on the Web in reality and will eventually raise the quality and interoperability of specifications, the Web and browsers.”
Ars Technica, which posted an analysis of the findings today, also seems to think that the search tool Opera writes about will be made public.
The MAMA study findings released today, also revealed that ~50% of pages sporting “W3C Valid” buttons aren’t actually valid. The reason, surmises Opera, is that keeping pages valid is not easy for many developers, and for many keeping up with evolving standards is difficult. The key takeaway, says Opera, “is that people are BAD at this ‘HTML thing.’ Improper tag nesting is rampant, and misspelled or misplaced element and attribute names happen all the time. It is very easy to make silly, casual mistakes — we all make them.”
One problem, though, is that the tools people are using to create web pages also turn out ugly, invalid code. MAMA also looked at validation as it related to web page editors and content management systems, and found that with the exception of Apple’s iWeb — for which an impressive 81.91% of URLs pass validation — the results were generally dismal. Just 0.55% of pages made with Microsoft Frontpage are valid, according to MAMA, just 3.44% for Adobe Dreamweaver.
For content management systems, the results weren’t much better. WordPress pages were valid just 9.0% of the time, just 12.74% for Typo, and 6.45% for Joomla. Google’s Blogger fared the worst, with only a paltry 0.30% of URLs passing validation.
There is a lot of interesting information to dig through in the full results and Opera is promising to release more results as part of a planned “long, multi-part saga.”