SitePoint Sponsor

User Tag List

Results 1 to 19 of 19
  1. #1
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Widgets and well-formed HTML

    Hey, I'm working on the widgets for my site and I'm finding that it is quite difficult if not impossible for me to make the HTML source well-formed. I can get the tab-delmination down correctly. This is mostly because of nested indentical tags. Nested divs, for example, are the biggest problem. This problem is that I can't indent a whole block of output, I have to indent each line. Here is the source:

    HTML Code:
    <html xmlns="http://www.w3.org/1999/xhtml">
    	<head>
    	<title>This is the News page</title>
    
    	</head>
    
    	<body>
    		<div id=header>
    	HelloHeader <-- Bad
    	</div> <-- Not good
    
    		<div id=container>
    		<div id=nav> <-- No worky
    	<li><a href="/index.php?action=DisplayHome">Home</a></li>
    <li><a href="/index.php?action=DisplayLogin">Login</a></li> 
    
    	</div>
    
    		<div id=content>
    	
    	</div>
    
    		<div id=footer>
    
    	This is the footer
    	</div>
    
    	</div>
    	</body>
    </html>
    Here is the code for the divs:

    PHP Code:
        public function tagObject($array$html_id NULL)
        {
            
    $this->output .= "\t<div id=$html_id>\n";
            foreach(
    $array as $value)
            {
                
    $this->output .= "\t" $value->output "\n";
            }
            
    $this->output .= "\t</div>\n";
        } 
    The problem occures when $this->output is more divs.

    I hope you can understand what I am talking about (since I have a hard time describing the problem). How do you fix this?

  2. #2
    SitePoint Wizard Ren's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    1,060
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I don't other about trying to getting the output looking nicely indented. If I have to inspect the outputted HTML, I run it through htmltidy (or xmltidy) first.

    There is a PHP extension for HTMLTidy so that may help if want to always have your html nested

  3. #3
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Do you know what kind of performance hit using HTMLTidy all the time would make? I'd rather get the HTML to look correct without it, but if I can't without using HTMLTidy than I'll give it a go.

  4. #4
    SitePoint Enthusiast Ilija Studen's Avatar
    Join Date
    Oct 2003
    Location
    Serbia
    Posts
    27
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Why is it so important? You are making your PHP code a mess just to have clean and well idented HTML. There must be a good reason for that

  5. #5
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    mijokijo, "well formed" != "nicely indented".

  6. #6
    SitePoint Zealot johno's Avatar
    Join Date
    Sep 2003
    Location
    Bratislava, Slovakia
    Posts
    184
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It's really not worth the effort but if you really want to. http://johno.jsmf.net/knowhow/pretty...preview-en.php
    Annotations support for PHP5
    TC/OPT™ Group Leader

  7. #7
    SitePoint Guru
    Join Date
    May 2003
    Location
    virginia
    Posts
    988
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yeah, for better performance (technically) you'd remove all the extra white space and lines! One long string of HTML. Nice and "clean"

    Matt

  8. #8
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    For even better performance, you could compress the file in question before the server sends it

  9. #9
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    So what you're saying is either it's not possible with PHP, or you don't know how? Because there are plenty of websites that do show pretty HTML in their source. Of course, the ones I know of don't use PHP.

  10. #10
    SitePoint Guru
    Join Date
    May 2003
    Location
    virginia
    Posts
    988
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    You'd have to know the depth of each tag in reference to the body while you're putting your application views together, OR have a post filter clean it all up. I think that you could get HTMLSax to do the latter. Have a look at the examples in the package.

    http://pear.php.net/package/XML_HTMLSax3

  11. #11
    SitePoint Member
    Join Date
    Mar 2006
    Posts
    22
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Are you looking for an automated solution like Tidy? If not, I don't suppose something like
    PHP Code:
    echo "<div>\n",
         
    str_replace(
            array( 
    "\r\n""\r""\n" ),
            
    "\n" str_repeat"\t"INDENT_LEVEL_IN_TABS )
         ),
         
    "</div>\n"
    would do? Of course, if you'd want to increment and decrement the level you shouldn't make it a constant (would be a pain though, doing "i++; output(); i--;" yourself for each nesting level)

  12. #12
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    On a more serious note (see other post I made) what you need to do if you feel you have to, is to walk the document tree, and push the node in question to a formatter; The formatter in question is typically a visitor, ie

    PHP Code:
    class Walker {
    private 
    $children;

    public function 
    __construct$children ) {
    $this -> children $children;
    }

    public function 
    walk$formatter /* visitor */ ) {
    foreach( 
    $this -> children as $child ) {
    $formatter -> push$child );
    $walker = new Walker$child -> childNodes() );
    $walker -> walk$formatter );
    }
    }

    class 
    Formatter {
    public function 
    pushNode $node ) {
    // ... do as you wish
    }

    This way, you have no need to work out the indention level on a per node basis, as the depth is taken care of from within the formatter in question; I've used this approach many, many times before and it's an eighth wonder of the world in my view...

    The formatter would retain the newly formatted node and append it to the document as it goes; The document therefore is constructed (again) by the formatter... To realise which node to append the newly formatted node to you need to traverse the current structure of the document.

    I've not yet once come across a more elegant, cleaner method than this, so that's the route I'd take
    Last edited by Dr Livingston; Apr 11, 2006 at 11:12. Reason: ...

  13. #13
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm digging your example, Dr. Livingston. XoloX's example is more compact, but less readable.

    What I wonder is where to I put that code? Here is a couple of examples:

    PHP Code:
    <?php
    class Page
    {
        public 
    $title;
        public 
    $dao;
        
        public function 
    __construct($title$dao)
        {
            
    $this->title $title;
            
    $this->dao $dao;
        }
        
        public function 
    buildPage($content)
        {
            
    $html = new HtmlTag();
            
    $head = new HeadTag();
            
    $title = new TitleTag();
            
    $css = new LinkTag();
            
    $body = new BodyTag();
            
    $header = new DivTag();
            
    $contentdiv = new DivTag();
            
    $container = new DivTag();
            
    $footer = new DivTag();
            
    $navdiv = new DivTag();
            
    $nav = new DisplayNavigation($this->dao);
            
    $nav->execute();
            
    $navview $nav->getView();
            
    $nav = array($navview->render());

            
    $header->tagText('HelloHeader''header');
            
    $divs = array($content);
            
    $contentdiv->tagObject($divs'content');
            
    $footer->tagText('This is the footer''footer');
            
    $navdiv->tagObject($nav'nav');
            
    $divs = array($navdiv$contentdiv$footer);
            
    $container->tagObject($divs'container');
            
    $divs = array($header$container);
            
    $body->tagObject($divs);
            
    $title->tagText($this->title);
            
    $css->tagText('indigo');
            
    $divs = array($title$css);
            
    $head->tagObject($divs);
            
    $headbody = array($head$body);
            
    $html->tagObject($headbody);
            
    $html->renderPage();
        }
    }
    ?>
    Basically this wraps the page around the content. I put all the HTML elements that are at the same level in arrays and pass them on to the next element (off-topic: are these what you call 'nodes'? Not familiar with that term. *noobie programmer*). So an array of object get's passed to the next HTML tage.

    The HtmlTag code looks like this:

    PHP Code:
    <?php
    class HtmlTag
    {
        public 
    $output;
        
        public function 
    __construct() {}
        
        public function 
    tagObject($array)
        {
            
    $this->output .= "<html xmlns=\"http://www.w3.org/1999/xhtml\">\n";
            foreach(
    $array as $value)
            {
                
    $this->output .= $value->output "\n";
            }
            
    $this->output .= "</html>";
        }
        
        public function 
    renderPage()
        {
            print 
    $this->output;
        }
    }
    ?>
    The question is, where do I use the walk function? After thinking about it, it looks like I'd put it in the HtmlTag (and perhaps subsiquent *Tag classes), since tagObject() and walk() look similar. Then the walk() function would make its way to every object and format it correctly. I'd probably have to change the way I have the *Tag's coded, though.

  14. #14
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Unless you for some reason need really finegrained control, having an object per dom-element in the html-document seems impractical to me. If you really must, why don't you use the DOM api. Incidentially, this will also solve your indent-problems, since the output can be controlled.

  15. #15
    Non-Member
    Join Date
    Jan 2003
    Posts
    5,748
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Before you would implement a Walker you need a structure first; Doesn't matter what that structure is composed of - it could be a (x)Html document, or something else, so long as there is the parent <> child relationship

    The advantage of pushing each node is that you have control of what you do with that node; You could decorate the visitor for example, so if one decorator isn't suitable, you let the decorated visitor push it to another, and so on...

    PHP Code:
    $walker = new Walker$dom -> documentElement -> childNodes );
    $formatter = new FormatOne/* decorate */ new FormatTwo() );
    $walker -> walk$formatter );
    // ...
    class FormatOne {
    public function 
    pushNode $node ) {
    if( !
    some condition ) {
    // let something else process this node
    $this -> decorated -> push$node );
    } else {
    // process your node here
    }
    // ... 
    As Kyber has suggested, you could use the DOM as an alternative as well; I recall you can adjust what indentation is to be applied to the DOM tree...

  16. #16
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    *head explodes*

    kyber, the reason why I have that, admittedly awkward, layout for building the page is that Dr Livingston suggested it

    I have to say I am not at all familiar with the vocabulary being used. Structure? Node? Visitor?

    The DOM stuff looks pretty involved. I'll definitely have to read more of the reference, but it seems more applicable to XML than (x)HTML.

  17. #17
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As already noted a couple of times, I think your original problem is a moot point, but if you really want to do it, htmltidy is the way to go. Keep in mind that it reformats your code, so occasionally it may break your design (esp. if you have messy markup in the first place).

    The DOM is quite verbose, so it really isn't suitable for creating whole pages. It's equaly suitable for XML and HTML. The good thing about DOM is that it's the same interface you use for manipulating the document on the clientside (javascript), so you might aswell pick it up sooner than later. Since the syntax is a bit awkward, it may help to create some wrapper/helper functions for dealing with it. In a recent thread, I posted a class for doing that :
    http://www.sitepoint.com/forums/showthread.php?t=251226
    The PEAR class XML_FastCreate does something similar.

  18. #18
    SitePoint Wizard Ren's Avatar
    Join Date
    Aug 2003
    Location
    UK
    Posts
    1,060
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Another maybe applicable thing, is if using XSLT as the basis of your views, then
    <xsl:output indent="yes" />

    in the stylesheet will nicely indent.

  19. #19
    SitePoint Enthusiast
    Join Date
    Dec 2005
    Posts
    46
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I've meditated on this issue for a bit and have an idea as to fix a couple of issues.

    One, I'm going to go about doing the tag objects a bit differently. I'm thinking that I can make a XHtml class to call the individual tag objects to do their job.

    PHP Code:
    <?php

    class XHtml
    {
        public 
    $output;
        
        public function 
    __construct() {}
        
        public function 
    html($string)
        {
            
    $html = new HtmlTag();
            
    $this->output .= $html->tagString($string);
        }
        
        public function 
    body($string)
        {
            
    $body = new BodyTag();
            
    $this->output .= $body->tagString($string);
        }

            
    // etc.
    }
    ?>
    Thus, I can simplify the use of the tag objects. It also simplifies the interface between the objects. What do you think of this idea?

    I was also wondering how to go about using the DOM api to fix my original problem, as well as giving me future growth for my project. The question is in the implementation. Should I include DOM function calls in each of the different tag objects, appending them to the root child as I go, or simply use the Walker/Formatter function idea that Dr. Livingston suggested on all the output before printing it to the page? The first idea only seems useful if I want to further simplify the use of the tag objects. Currently, I have to put everything in sort of backwards, which is confusing and difficult to follow. Would it be possible to fix this problem by using DOM as well?

    Something like:
    PHP Code:
    <?php

    class HtmlTag
    {
        public 
    $output;
        public 
    $dom// DOMDocument object
        
        
    public function __construct($dom)
        {
            
    $this->dom $dom;
        }
        
        public function 
    tagString($string)
        {
            
    $this->output .= $this->dom->createTextNode($string);
            
    $this->dom->appendChild($this->output);
        }
        
        public function 
    renderPage()
        {
            print 
    $this->output;
        }
    }

    ?>
    I'm not sure if that'd work, since I've never messed with DOM before.

    How far off am I?


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •