SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Unhappy XML Semantics and XPath: Adjacent Text and Element Nodes

    Hello all,

    I am doing some experimenting with XPath via the PHP SimpleXMLElement class. I want to perform certain queries to retrieve various elements (clearly, since I'm using XPath queries to do it). However, I am running into an issue when I appear to have adjacent text and element nodes.

    (X)HTML has no problem parsing this and allowing access via Javascript:
    Code XML:
    <p>The color <span class="color">orange</span> has always been my favorite color.</p>

    I have looked at the W3C specification for XML (just to verify that this is valid markup, even though I know it is) and found this definition:
    3.2.2 Mixed Content
    [Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.]
    Ok, so maybe the problem is with my XPath queries? This is what I tried ($p is an instance of SimpleXMLElement representing the p element):
    Code PHP:
    $content = $p->xpath('child::*');   //get all children of p. returns a SimpleXML object containing the text 'orange'
    $content = $p->xpath('child::text()');   //get all text nodes which are children of p. returns 2 SimpleXMLElement objects representing just the span element!

    Any similar queries targeting the same elements return the same thing. So, in the first case(get all children), only a text node containing 'orange' seems to be recognized, but in the second (all children that are text nodes) 2 copies of the span element itself seem to be the only things recognized! The rest of the text, which I thought would be contained in two text nodes, is never recognized. I am way confused right now. Thoughts?
    Humbly,

    Smola

  2. #2
    SitePoint Zealot
    Join Date
    Apr 2005
    Location
    London
    Posts
    163
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not sure what you want to select.. you could try

    //p/child::node()

    if you want to select both text nodes and elements?

  3. #3
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    jurn: I want access to all of the content. Ideally I would like to see a text node with the text up to the span, the span, then another text node with the rest of the text. I just need to be able to access all of the data. I tried your siggestion and after running that XPath query this is what was returned:
    PHP Code:
    Array
    (
        [
    0] => SimpleXMLElement Object
            
    (
                [
    span] => orange
            
    )

        [
    1] => SimpleXMLElement Object
            
    (
                [@
    attributes] => Array
                    (
                        [class] => 
    color
                    
    )

                [
    0] => orange
            
    )

        [
    2] => SimpleXMLElement Object
            
    (
                [
    span] => orange
            
    )


    It seems to have quite a bit of trouble recognizing the text nodes at all...
    Humbly,

    Smola

  4. #4
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    Maybe ->nodeValue or ->textContent would do it?

  5. #5
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Mittineague: I am not sure what context to use your suggestion. Neither the SimpleXMLElement nor DOM (I checked just in case) include either of those methods. I also checked the XPath documentation at w3.org and couldn't find them either. Am I missing something?
    Humbly,

    Smola

  6. #6
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    :d'oh:

    Sorry about that, I've been working with XPATH in javascript and forgot to shift gears.

    I'll put together a test case and get back ASAP

  7. #7
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,036
    Mentioned
    187 Post(s)
    Tagged
    2 Thread(s)
    I tried with simpleXML but couldn't get it to work unless I added explicit <text> nodes around the text.

    But I was closer than I thought.Try:
    Code PHP:
    <?php
    $xmlstr = <<<XML
    <p>The color <span class="color">orange</span> has always been my favorite color.</p>
    XML;
     
    $doc = new DOMDocument;
    $doc->loadXML($xmlstr);
    $xpath = new DOMXPath($doc);
    $query = '//p';
    $ptags = $xpath->query($query);
    foreach ($ptags as $ptag)
    {
    	echo $ptag->nodeValue . "<br />\n";
    }
    ?>

  8. #8
    SitePoint Addict Smola's Avatar
    Join Date
    Mar 2005
    Posts
    260
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Mitteneague! I really wanted to stick with SimpleXML but if it doesn't work there's not much I can do. I'll just have to work with the solution you provided with DOM constructs. Thanks again for your time!
    Humbly,

    Smola


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •