SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Wizard Mincer's Avatar
    Join Date
    Mar 2001
    Location
    London | UK
    Posts
    1,140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question DOMXML and accessing this XML data [Harry??]

    Right.

    I thought I'd have a go at re-writing my SETI stats scripting using the XML data that Berkeley provide as an exercise in DOM/XML processing, and as some ground work for when they move to the next project.

    The structure of the XML file is like this:
    Code:
    <?xml version="1.0" encoding="iso-8859-1" ?>
    <!DOCTYPE groupstats SYSTEM "http://setiathome.ssl.berkeley.edu/xml/groupstats.dtd">
    <groupstats>
    <name>Phoenix Rising</name>
    <url>forums.teamphoenixrising.net/</url>
    <nummembers>495</nummembers>
    <numresults>2070017</numresults>
    <totalcpu> 1952.726 years</totalcpu>
    <founder>
    <name>riddlermarc</name>
    <url>forums.teamphoenixrising.net/index.php?s=</url>
    <profile>Profile goes here</profile>
    </founder>
    <topmembers>
    <member>
    <name>Alta Rica</name>
    <url>forums.teamphoenixrising.net/index.php</url>
    <profile>Profile goes here</profile>
    <numresults>100691</numresults>
    <totalcpu> 48.526 years</totalcpu>
    <avecpu>4 hr 13 min 18.0 sec</avecpu>
    <datelastresult>Mon Mar 31 17:58:24 2003</datelastresult>
    <country>United Kingdom</country>
    </member>
    <member>
    <name>WWW.TEAMPHOENIXRISING.NET</name>
    <url>www.teamphoenixrising.net</url>
    <numresults>100437</numresults>
    <totalcpu> 72.284 years</totalcpu>
    <avecpu>6 hr 18 min 16.3 sec</avecpu>
    <datelastresult>Thu Feb 20 00:46:05 2003</datelastresult>
    <country>Denmark</country>
    </member>
    </topmembers>
    </groupstats>
    I'm basically trying to grab the data for the team, then each member in turn to process and enter into a database.

    My unsuccessfull attempts at getting at the data can be seen below, but as the documentation for DOMXML is still pretty basic, I can't seem to fathom it. I did find one article on devarticles, but it's quite old so all the functions have changed.
    PHP Code:
    $file './tpr_team_corrected.xml' ;
    $raw_xml file_get_contents$file ) ;

    if( ! 
    $dom domxml_open_mem$raw_xml ) )
    {
    die( 
    "Error while parsing the document" ) ;
    }

    $root $dom->document_element() ;
    $top_members $root->get_elements_by_tagname'topmembers' ) ;
    $children $top_members[0]->child_nodes() ;

    echo 
    "<pre>" ;

    foreach( 
    $children AS $child )
    {
    if( 
    $child->type == )
    {
    $member $child->get_elements_by_tagname'member' ) ;
    print_r$member ) ;
    }
    }

    echo 
    "</pre>" 
    This just prints 2 empty arrays.

    Any help would be greatly appreciated.

    Matt. [img]images/smilies/smile.gif[/img]

    EDITED: Added newlines to profile fields to stop it killing the page width.. And is it me, or does this damn textarea thing f*** up the code something proper!

    EDIT 2: I've cut out the profile fields now as the forums don't take a blind bit of notice as to whether I want urls parsed or not. [img]images/smilies/rolleyes.gif[/img]
    Last edited by Mincer; Apr 8, 2003 at 03:32.

  2. #2
    SitePoint Wizard gold trophysilver trophy
    Join Date
    Nov 2000
    Location
    Switzerland
    Posts
    2,479
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Think the problem is here;

    PHP Code:
    $top_members[0]->child_nodes() 
    $top_members[0] may be a text node in which case it wont have children.

    In general it's better to run a loop each time you use child_nodes(). Here's something that gets you the children of a <member /> element;

    PHP Code:
    <?php
    $file
    ='seti.xml';
    $raw_xml file_get_contents($file);
    if( ! 
    $dom domxml_open_mem($raw_xml)){
        die( 
    "Error while parsing the document" );
    }

    $root $dom->document_element();
    $top_members $root->get_elements_by_tagname'topmembers' );

    topMembers($top_members);

    function 
    topMembers ($top_members) {
        foreach ( 
    $top_members as $top_member ) {
            
    $children=$top_member->child_nodes();
            
    topChildren($children);
        }
    }

    function 
    topChildren ($children) {
        foreach ( 
    $children as $child ) {
            if ( isset ( 
    $child->tagname ) && $child->tagname == 'member' ) {
                
    $memberChildren=$child->child_nodes();
                
    getMemberContents($memberChildren);
            }
        }
    }

    function 
    getMemberContents($memberChildren) {
        foreach ( 
    $memberChildren as $memberChild ) {
            echo ( 
    '<pre>' );
            
    print_r($memberChild);
            echo ( 
    '</pre>' );
        }
    }
    ?>
    It's better to use functions at least for each level down the document you go otherwise you've quickly got a mess on your hands.

    Having said that, a far better approach would be to use XPath, for example...

    PHP Code:
    <?php
    $setiDoc
    =file_get_contents('seti.xml');
    $dom=domxml_open_mem($setiDoc);
    $ctx $dom->xpath_new_context(); 

    $members=& $ctx->xpath_eval("//member/descendant-or-self::*");

    echo ( 
    '<pre>' );
    print_r($members);
    echo ( 
    '</pre>' );
    ?>
    This will return objects a list something like;

    member
    name
    url
    profile
    etc.
    member
    name
    url
    profile
    etc.

    That way you can process them as a list, perhaps starting a new table row every time you encounter a "member" element.

    Good Xpath tutorial here: http://www.zvon.org/xxl/XPathTutoria.../examples.html

  3. #3
    SitePoint Wizard Mincer's Avatar
    Join Date
    Mar 2001
    Location
    London | UK
    Posts
    1,140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Harry. I'll have a look at the second method.

    I've just manages to fathom how I needed to recurse the tree for each member. I put in the filter for the profile field as it dived deeper.

    PHP Code:
    $file './tpr_team_corrected_cut.xml' ;
    $raw_xml file_get_contents$file ) ;
    if( ! 
    $dom domxml_open_mem$raw_xml ) )
    {
    die( 
    "Error while parsing the document" ) ;
    }
    $root $dom->document_element() ;
    $top_members $root->get_elements_by_tagname'topmembers' ) ;
    $children $top_members[0]->child_nodes() ;
    echo 
    "<pre>" ;
    foreach( 
    $children AS $child )
    {
      if( 
    $child->node_type() == )
      {
        
    $member_info $child->child_nodes() ;
      foreach( 
    $member_info AS $item )
      {
       if( 
    $item->node_type() == )
       {
        if( 
    $item->tagname != 'profile' )
        {
         
    $temp $item->first_child() ;
         
    $property $temp->content ;
         echo 
    $property "<br>\n" ;
        }
       }
      }
      }
    }
    echo 
    "</pre>" 
    I'll read up some more...

    Matt.

  4. #4
    SitePoint Wizard Mincer's Avatar
    Join Date
    Mar 2001
    Location
    London | UK
    Posts
    1,140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok, another question. If I use the xpath method to do this:

    PHP Code:
    // url of team xml stats
    $file './tpr_team.xml' ;
    // grab the xml
    $raw_xml file_get_contents$file ) ;
    // fix the bad amphersands in Berkekey's xml
    $raw_xml preg_replace('/&(?![0-9a-z]{1,6};)/is''&amp;'$raw_xml); 
    // create a DOM object from the xml
    $dom domxml_open_mem$raw_xml ) ;
    // create an array of member data objects from just the member elements
    $members_ctx $dom->xpath_new_context() ; 
    $members =& $members_ctx->xpath_eval"//member" ) ; 
    echo 
    '<pre>' ;
    foreach( 
    $members->nodeset AS $member )
    {
     
    // bits to go here
    }
    echo 
    '</pre>' 
    How can I do the same sort of thing to get the next level down, ie each child node of the member element. Is it possible to create another context from the member elements? (I know it's after a string). Or is this a bad way to do it IYHO?).

    I like the compartmentalisation of getting each member then looping through to get it's properties, rather than going through a long list of elements and trapping each member when I hit another name node.

    I'll get there in the end. I know I could get somthing that works, but that's not the point now is it. [img]images/smilies/biggrin.gif[/img]

    Matt. [img]images/smilies/smile.gif[/img]

  5. #5
    SitePoint Wizard gold trophysilver trophy
    Join Date
    Nov 2000
    Location
    Switzerland
    Posts
    2,479
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When you call the xpath_eval method, you can provide it a second argument which is a DOM object you've fetched, for example;

    PHP Code:
    <?php
    $setiDoc
    =file_get_contents('seti.xml');
    $dom=domxml_open_mem($setiDoc);
    $ctx $dom->xpath_new_context(); 

    $members_set=& $ctx->xpath_eval("//member");

    foreach ( 
    $members_set->nodeset as $member ) {
        
    $name_set=& $ctx->xpath_eval("name",$member);
        echo ( 
    '<pre>' );
        
    print_r($name_set);
        echo ( 
    '</pre>' );
    }
    ?>
    What you wanted?

  6. #6
    SitePoint Wizard Mincer's Avatar
    Join Date
    Mar 2001
    Location
    London | UK
    Posts
    1,140
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by HarryF
    When you call the xpath_eval method, you can provide it a second argument which is a DOM object you've fetched..

    ..What you wanted?
    Great Harry, thanks. I'm sure this will be a whole lot easier when there's some solid documentation. In the mean time, I'll have to pester you.

    Matt.

  7. #7
    SitePoint Wizard gold trophysilver trophy
    Join Date
    Nov 2000
    Location
    Switzerland
    Posts
    2,479
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    BTW - just slapped up a quick article which may help...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •