Thanks clevatreva. I didn't know about that setting. I had it in XHTML mode, with Strict doctype. I would have though that putting into XHTML Strict mode would have done just that - make sure the output was XHTML Strict.
Also, this doesn't fix other problems with its XHTML Strict support. For instance:
Code:
<html>
<body>
<img />
</body>
</html>
Becomes this:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
<title></title>
</head>
<body>
<img />
</body>
</html>
This is exactly the same problem (inline element in body), and is not helped by the "enclose block text in paragraphs" setting. Images are not allowed in the body element.
Also, surely Tidy should know that all images must have a "src" attribute and an "alt" attribute? I would have assumed that this would be an easy thing to enforce.
Check this out:
Code:
<html>
<body>test</body>
<head>
<title>hello</title>
</head>
</html>
Is changed to this:
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
<title></title>
<title>hello</title>
</head>
<body>
<p>test</p>
</body>
</html>
Ends up with two <title> elements! This is a bit buggy. I wonder if it would accept two title elements if I had given it two...
Code:
<html><head>
<title>First Title</title>
<title>Second Title</title>
</head></html>
Code:
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta name="generator"
content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
<title>First Title</title>
<title>Second Title</title>
</head>
<body>
</body>
</html>
Yep
But it gives a warning. Surely, since it can't achieve XHTML Strict, it should fix the problem, or at least halt and give an error.
Tidy appears to be just fixing the limited set of problems which it knows how to fix, rather than checking the document against the relevant DOCTYPE.
I think these limitations have great implications for anyone thinking of using Tidy to filter their markup to make it XHTML compliant.
What we need is a tool like Tidy that validates the document against the DOCTYPE, so that the output is always valid against that document type no matter what is input.
Bookmarks