SitePoint Sponsor

User Tag List

Results 1 to 6 of 6

Hybrid View

  1. #1
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    HTML Tidy problems

    Hello,

    I'm having problems with Tidy. I have it set to output XHTML Strict. I input the following:

    Code:
    <html>
    
    This is a test
    
    </html>
    This is the output I get:

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    
    <html xmlns="http://www.w3.org/1999/xhtml">
    <head>
        <title>foo</title>
    </head>
    
    <body>
        This is a test
    </body>
    Those of you who know XHTML will quickly recognise that the text "this is a test" can't go here; it needs to be inside a <p> element.

    Tidy warns me about this, but it doesn't actually fix it. Does anybody know if there is a configuration option which forces all output from Tidy to be XHTML compliant, and actually fix things like this?
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  2. #2
    SitePoint Enthusiast
    Join Date
    Jan 2004
    Location
    Glasgow, Scotland
    Posts
    60
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I tried your test page in Dreamweaver's html cleaner and got the same result. I then ran it through Dreamweaver's validator as XHML strict and got no errors. But on checking on w3c's validator the missing markup was picked up on.

    Just bad programing on behalf of the developers.

  3. #3
    SitePoint Evangelist ClevaTreva's Avatar
    Join Date
    Jan 2004
    Location
    Chipping Campden, UK
    Posts
    403
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi

    From the HTML tidy quickref documentation:

    enclose-text Top
    Type: Boolean
    Default: no
    Example: y/n, yes/no, t/f, true/false, 1/0
    This option specifies if Tidy should enclose any text it finds in the body element within a <P> element. This is useful when you want to take existing HTML and use it with a style sheet.
    If you used something like WebCoder V4 (or v5 beta) it has HTML tidy and all the options can be set from the functions menu.





    Trevor
    "Good artists copy, great artists steal."
    - Pablo Picasso
    The image of ClevaTreva is drawn by Rhys, and is a GOOD likeness

  4. #4
    One website at a time mmj's Avatar
    Join Date
    Feb 2001
    Location
    Melbourne Australia
    Posts
    6,282
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thanks clevatreva. I didn't know about that setting. I had it in XHTML mode, with Strict doctype. I would have though that putting into XHTML Strict mode would have done just that - make sure the output was XHTML Strict.

    Also, this doesn't fix other problems with its XHTML Strict support. For instance:

    Code:
    <html>
    <body>
    
    <img />
    
    </body>
    </html>
    Becomes this:

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <meta name="generator"
        content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
        <title></title>
      </head>
      <body>
        <img />
      </body>
    </html>
    This is exactly the same problem (inline element in body), and is not helped by the "enclose block text in paragraphs" setting. Images are not allowed in the body element.

    Also, surely Tidy should know that all images must have a "src" attribute and an "alt" attribute? I would have assumed that this would be an easy thing to enforce.

    Check this out:

    Code:
    <html>
    <body>test</body>
    <head>
    <title>hello</title>
    </head>
    </html>
    Is changed to this:

    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <meta name="generator"
        content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
        <title></title>
        <title>hello</title>
      </head>
      <body>
        <p>test</p>
      </body>
    </html>
    Ends up with two <title> elements! This is a bit buggy. I wonder if it would accept two title elements if I had given it two...

    Code:
    <html><head>
    
    <title>First Title</title>
    <title>Second Title</title>
    
    </head></html>
    Code:
    <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
        "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
    
    <html xmlns="http://www.w3.org/1999/xhtml">
      <head>
        <meta name="generator"
        content="HTML Tidy for Linux/x86 (vers 1st November 2002), see www.w3.org" />
    
        <title>First Title</title>
    
        <title>Second Title</title>
      </head>
    
      <body>
      </body>
    </html>
    Yep But it gives a warning. Surely, since it can't achieve XHTML Strict, it should fix the problem, or at least halt and give an error.

    Tidy appears to be just fixing the limited set of problems which it knows how to fix, rather than checking the document against the relevant DOCTYPE.

    I think these limitations have great implications for anyone thinking of using Tidy to filter their markup to make it XHTML compliant.

    What we need is a tool like Tidy that validates the document against the DOCTYPE, so that the output is always valid against that document type no matter what is input.
    [mmj] My magic jigsaw
    ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
    The Bit Depth Blog Twitter Contact me
    Neon Javascript Framework Jokes Android stuff

  5. #5
    SitePoint Evangelist ClevaTreva's Avatar
    Join Date
    Jan 2004
    Location
    Chipping Campden, UK
    Posts
    403
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi

    I agree tidy is not so hot. It is your first line of checking. BUT, you should always use the w3 validator. But even that only checks against standards. It still doesn't let you know when you have been semantically incorrect.

    Personally, I don't use tidy at all.




    Trevor
    "Good artists copy, great artists steal."
    - Pablo Picasso
    The image of ClevaTreva is drawn by Rhys, and is a GOOD likeness

  6. #6
    100% Windoze-free earther's Avatar
    Join Date
    Feb 2003
    Location
    Linuxland
    Posts
    2,788
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I use Tidy and have been quite happy with it - I rarely get an error at W3C validating for xhtml 1.0 transitional.

    There is an archived public mailing list html-tidy@w3.org. The folks on the Note Tab list have also been very helpful with Tidy. They walked me through setting up a clip and a configuration file to get it to do what I wanted.

    You might want to take this discussion to those lists.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •