SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    SitePoint Enthusiast
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    36
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    php5 need something like innerHTML instead of nodeValue

    I am using cURL and xpath to get data from an html page and everything is going well until I encounter a node like this:

    Code:
    <td id="foo"> 
    The first bit of Data I want
    <br>The second bit of Data I want
    <br>The third bit of Data I want</td>
    so I curl the page and setup my xpath like this:

    Code:
    $fooNode = $xpath->evaluate("/html/body//td[@id='foo'];
    $fooString = $fooNode->item(0)->nodeValue;
    $echo $fooString;
    which gives me something like:
    "The first bit of Data I wantThe second bit of Data I wantThe third bit of Data I want"

    as a result with no way to separate the data (before you ask the data above is just an example, can't explode the string via "The")

    What I would like instead is some way to return the node with markup intact sorta like innerHTML in js, so I can explode it to an array via the "<br>" tag. like:

    "The first bit of Data I want<br>The second bit of Data I want<br>The third bit of Data I want"

    Is there anyway to save a node as a string with <br> tags intact?

    Forgive me if this has been answered before, I did look around the site and Googled all last night to no avail.

  2. #2
    We're from teh basements.
    Join Date
    Apr 2007
    Posts
    1,205
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    $fooBits = explode("<br>",$fooString);
    print_r($fooBits);

  3. #3
    SitePoint Enthusiast
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    36
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    As I stated that is exactly what I would do if nodeValue would return a $foobits with the <br>s intact but it doesn't. nodeValue strips out all the tags.

  4. #4
    Floridiot joebert's Avatar
    Join Date
    Mar 2004
    Location
    Kenneth City, FL
    Posts
    823
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not familiar with the "evaluate" method, but if it returns SimpleXMLElement objects the same as the SimpleXML objects xpath method does, you can use the "asXML()" method of the SimpleXMLElement object.

    http://www.php.net/manual/function.s...ment-asXML.php

    Code:
    echo $fooNode[0]->asXML();

  5. #5
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,397
    Mentioned
    64 Post(s)
    Tagged
    0 Thread(s)
    The following isn't concrete (more a basic proof of concept:in other words, untested) but you could probably get at the "innerXML" (or innerHTML if you adapted the following function slightly) like:

    innerXML function
    PHP Code:
    function innerXML($node)
    {
        
    $doc  $node->ownerDocument;
        
    $frag $doc->createDocumentFragment();
        foreach (
    $node->childNodes as $child)
        {
            
    $frag->appendChild($child->cloneNode(TRUE));
        }
        return 
    $doc->saveXML($frag);

    Quicky example
    PHP Code:
    $dom = new DOMDocument();
    $dom->loadXML('
    <table>
    <tr>
        <td id="foo"> 
            The first bit of Data I want
            <br/>The second bit of Data I want
            <br/>The third bit of Data I want
        </td>
    </tr>
    </table>
    '
    );

    $node $dom->getElementsByTagName('td')->item(0);
    echo 
    innerXML($node); 
    Salathe
    Software Developer and PHP Manual Author.

  6. #6
    SitePoint Enthusiast
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    36
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Red face

    Tried both the asXML() function and the innerXML($node) functions but neither worked. I had tried the asXML() before but abandoned going any further with it since it seems like it is just for SimpleXML objects and not DOMDocument objects. Would love to be proven wrong though!

    I think the createDocumentFragment() path looks the most promising but I am having trouble finding the documentation for it, or useful examples of it used in a tutorial. Any suggestions?

    Thanks again for all your help. I promise to post the solution once I find as I think many people would like to know how to do this.

  7. #7
    SitePoint Enthusiast
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    36
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Jumped the gun a bit, your function, Salathe, seems to be working
    innerXML function
    PHP Code:
    function innerXML($node)
    {
        
    $doc  $node->ownerDocument;
        
    $frag $doc->createDocumentFragment();
        foreach (
    $node->childNodes as $child)
        {
            
    $frag->appendChild($child->cloneNode(TRUE));
        }
        return 
    $doc->saveXML($frag);

    used this code instead
    PHP Code:
    $dom = new DOMDocument();
    $dom->loadXML('
    <html>
    <body>
    <table>
    <tr>
        <td id="foo"> 
            The first bit of Data I want
            <br/>The second bit of Data I want
            <br/>The third bit of Data I want
        </td>
    </tr>
    </table>
    <body>
    <html>

    '
    );
    $xpath = new DOMXPath($dom);
    $node $xpath->evaluate("/html/body//td[@id='foo' ]");
    $nameObject innerXML($node->item(0));
    echo 
    $nameObject

    gives this result:
    PHP Code:
    <>The first bit of Data I want
    The second bit of Data I want
    The third bit of Data I want 
    My goal is to end up with three distinct strings:

    $firstString = "The first bit of Data I want";
    $secondString = "The second bit of Data I want";
    $thirdString = "The third bit of Data I want";

    What type of data is it that innerXML() returns and how do I get at the three bits of data in it. Thanks again I think I am really close here!

  8. #8
    SitePoint Enthusiast
    Join Date
    Aug 2003
    Location
    NYC
    Posts
    36
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Got it working, did some more reading and realized saveXML outputs a string so viewed the source and saw that they were using <br /> instead of <br> tags, exploded that and was home free thanks for all your help. Here is the generalized code I used to get the results.

    PHP Code:
    function innerXML($node)

    {

        
    $doc  $node->ownerDocument;

        
    $frag $doc->createDocumentFragment();

        foreach (
    $node->childNodes as $child)

        {

            
    $frag->appendChild($child->cloneNode(TRUE));

        }

        return 
    $doc->saveXML($frag);




    $dom = new DOMDocument();

    $dom->loadXML('

    <html>

    <body>

    <table>

    <tr>

        <td id="foo"> 

            The first bit of Data I want

            <br />The second bit of Data I want

            <br />The third bit of Data I want

        </td>

    </tr>

    </table>

    <body>

    <html>



    '
    );

    $xpath = new DOMXPath($dom);

    $node $xpath->evaluate("/html/body//td[@id='foo' ]");

    $dataString innerXML($node->item(0));
    $dataArr explode("<br />"$dataString);

    $dataUno $dataArr[0];
    $dataDos $dataArr[1];
    $dataTres $dataArr[2];

    echo 
    "firstdata = $nameUno<br />seconddata = $nameDos<br />thirddata = $nameTres<br />" 
    which yields:

    PHP Code:
    firstdata The first bit of Data I want
    seconddata 
    The second bit of Data I want
    thirddata 
    The third bit of Data I want 

    Thanks again and hope this helps someone else!

  9. #9
    Twitter: @AnthonySterling silver trophy AnthonySterling's Avatar
    Join Date
    Apr 2008
    Location
    North-East, UK.
    Posts
    6,111
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)
    This would work also...
    PHP Code:
    <?php
    function getNodeInnerHTML(DOMNode $oNode)
    {
        
    $oDom = new DOMDocument();
        foreach(
    $oNode->childNode as $oChild)
        {
            
    $oDom->appendChild($oDom->importNode($oChildtrue));
        }
        return 
    $oDom->saveHTML();
    }
    ?>
    @AnthonySterling: I'm a PHP developer, a consultant for oopnorth.com and the organiser of @phpne, a PHP User Group covering the North-East of England.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •