SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    Visible Ninja bronze trophy
    JeffWalden's Avatar
    Join Date
    Sep 2002
    Location
    Los Angeles
    Posts
    1,709
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)

    Parse XML that Contains CDATA XML

    I'm attempting to parse some XML using PHP and normally would just use simplexml_load_string() to read the data. However, the XML that I need to parse includes CDATA and another embedded XML document.

    Code:
    <?xml version="1.0" encoding="utf-8"?>
    <webRequest>
        <id>160810</id>
        <request>
            <merchantShortName>ReServe</merchantShortName>
            <serviceName>reservationManagementServices</serviceName>
            <actionName>diningResListGet</actionName>
        </request>
        <authentication>
            <username>lynn</username>
            <password>lynn</password>
        </authentication>
        <content>
        <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
            <diningResListGetRequest>
                <dateRangeFilter>
                    <fromDate>2015-10-15</fromDate>
                    <toDate>2015-10-15</toDate>
                </dateRangeFilter>
                <siteNameFilter>
                    <matchCriterion>EqualTo</matchCriterion>
                    <stringValue>Frontier Vineyards</stringValue>
                </siteNameFilter>
                <maxReturned>50</maxReturned>
                <servicePeriodFilter>Dinner</servicePeriodFilter>
                <modifiedDateTimeFilter>
                    <fromDateTime>2012-04-01T12:00:00</fromDateTime>
                </modifiedDateTimeFilter>
            </diningResListGetRequest>
        ]]></content>
    </webRequest>
    When I attempt to parse this XML using simplexml_load_string() I receive the error: Warning: simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the start of the document.

    What's the trick to getting at the data contained in the embedded XML document using PHP? I'm likely not fully understanding what I'm looking at so my Google searches haven't turned up anything useful. Not even sure what that second embedded XML document is called. Any help is appreciated!
    TAKE A WALK OUTSIDE YOUR MIND.

  2. #2
    SitePoint Addict
    Join Date
    Aug 2006
    Location
    Nantwich, Cheshire
    Posts
    278
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    I'm no expert on this, but according to this page (http://php.chinaunix.net/manual/sl/f...oad-string.php, look in the comments from around June 2008) there seems to be a need to escape the contents of the file prior to using simplexml_load_string if there is CDATA in place. Also there is talk of a LIBNOCDATA option on that function, though reading it seems that it might make things worse rather than better.
    http://www.firenza.net - my homage to a car from the 1970s

  3. #3
    Always A Novice bronze trophy
    K. Wolfe's Avatar
    Join Date
    Nov 2003
    Location
    Columbus, OH
    Posts
    2,079
    Mentioned
    53 Post(s)
    Tagged
    0 Thread(s)
    SimpleXMLString was always the quicker, less featured parser, use DOM. And heres an example of how to use it with cdata: http://stackoverflow.com/questions/6...-using-php-dom

  4. #4
    Visible Ninja bronze trophy
    JeffWalden's Avatar
    Join Date
    Sep 2002
    Location
    Los Angeles
    Posts
    1,709
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Thanks guys, that helps a bit. The issue I'm running into now regardless of which parsing tool I use is the second XML declaration within the CDATA element. The parser is puking on that section.

    Warning: simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the start of the document in /xml-parse.php on line 19
    Warning: simplexml_load_string(): <?xml version="1.0" encoding="utf-8"?> in /xml-parse.php on line 19

    This is continuing to use simplexml_load_string(). I received the same error when using DOMDocument().

    My next thought was to simply strip out the CDATA and additional XML declaration from the XML given that it's useless to me in terms of parsing the data:

    PHP Code:
    $rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $rawXML);
    $rawXML = str_replace(']]>', "", $rawXML);
    My resulting XML document looks as such:

    Code:
    <?xml version="1.0" encoding="utf-8"?>
    <webRequest>
        <id>160810</id>
        <request>
            <merchantShortName>ReServe</merchantShortName>
            <serviceName>reservationManagementServices</serviceName>
            <actionName>diningResListGet</actionName>
        </request>
        <authentication>
            <username>lynn</username>
            <password>lynn</password>
        </authentication>
        <content>
        
            <diningResListGetRequest>
                <dateRangeFilter>
                    <fromDate>2015-10-15</fromDate>
                    <toDate>2015-10-15</toDate>
                </dateRangeFilter>
                <siteNameFilter>
                    <matchCriterion>EqualTo</matchCriterion>
                    <stringValue>Frontier Vineyards</stringValue>
                </siteNameFilter>
                <maxReturned>50</maxReturned>
                <servicePeriodFilter>Dinner</servicePeriodFilter>
                <modifiedDateTimeFilter>
                    <fromDateTime>2012-04-01T12:00:00</fromDateTime>
                </modifiedDateTimeFilter>
            </diningResListGetRequest>
        </content>
    </webRequest>
    Everything looks okay to me here, but I'm still receiving that error about the second declaration. I double and triple checked that I'm not feeding the original XML into the simplexml_load_string() but rather am feeding the replaced variable. Full code is below in case I'm missing something.

    PHP Code:
    $rawXML = file_get_contents("php://input");
    $rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $rawXML);
    $rawXML = str_replace(']]>', "", $rawXML);
    $xml = simplexml_load_string($rawXML);
    Any additional thoughts on this error?
    TAKE A WALK OUTSIDE YOUR MIND.

  5. #5
    SitePoint Member
    Join Date
    Feb 2009
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Will any of these suffice?

    PHP Code:
    <?php

    $xml 
    = <<<XML
    <?xml version="1.0" encoding="utf-8"?>
    <webRequest>
        <id>160810</id>
        <request>
            <merchantShortName>ReServe</merchantShortName>
            <serviceName>reservationManagementServices</serviceName>
            <actionName>diningResListGet</actionName>
        </request>
        <authentication>
            <username>lynn</username>
            <password>lynn</password>
        </authentication>
        <content>
        <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
            <diningResListGetRequest>
                <dateRangeFilter>
                    <fromDate>2015-10-15</fromDate>
                    <toDate>2015-10-15</toDate>
                </dateRangeFilter>
                <siteNameFilter>
                    <matchCriterion>EqualTo</matchCriterion>
                    <stringValue>Frontier Vineyards</stringValue>
                </siteNameFilter>
                <maxReturned>50</maxReturned>
                <servicePeriodFilter>Dinner</servicePeriodFilter>
                <modifiedDateTimeFilter>
                    <fromDateTime>2012-04-01T12:00:00</fromDateTime>
                </modifiedDateTimeFilter>
            </diningResListGetRequest>
        ]]></content>
    </webRequest>
    XML;


    $DOMDocument = new DOMDocument();
    $DOMDocument->loadXML$xml );

    $DOMNodeList $DOMDocument->getElementsByTagName'content' );

    echo 
    $DOMNodeList->item)->textContent;
    echo 
    "\n------------------------------------\n";

    $DOMXPath         = new DOMXPath$DOMDocument );
    $xpathDomNodeList $DOMXPath->query'//webRequest/content' );
    echo 
    $xpathDomNodeList->item)->textContent;

    echo 
    "\n-------------end of dom-----------------------\n";

    $simpleXMLElement = new SimpleXMLElement$xml );
    $simpleXMLXPathedElements $simpleXMLElement->xpath'//webRequest/content' );
    echo (string)
    $simpleXMLXPathedElements];

    //Actually this just works with no special treatment( string replacements etc)
    echo $simpleXMLElement->content;
    Last edited by pbyrne84; Sep 3, 2013 at 12:01. Reason: There should be no need for any special action as it is valid xml

  6. #6
    SitePoint Member
    Join Date
    Feb 2009
    Posts
    17
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quick question are you trying to conjoin two documents into one to parse it all on one parse as the content document is a separate one? They can be merged, and your code works fine for that as this is your example isolated.

    PHP Code:
    $xml = <<<XML
    <?xml version="1.0" encoding="utf-8"?>
    <webRequest>
        <id>160810</id>
        <request>
            <merchantShortName>ReServe</merchantShortName>
            <serviceName>reservationManagementServices</serviceName>
            <actionName>diningResListGet</actionName>
        </request>
        <authentication>
            <username>lynn</username>
            <password>lynn</password>
        </authentication>
        <content>
        <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
            <diningResListGetRequest>
                <dateRangeFilter>
                    <fromDate>2015-10-15</fromDate>
                    <toDate>2015-10-15</toDate>
                </dateRangeFilter>
                <siteNameFilter>
                    <matchCriterion>EqualTo</matchCriterion>
                    <stringValue>Frontier Vineyards</stringValue>
                </siteNameFilter>
                <maxReturned>50</maxReturned>
                <servicePeriodFilter>Dinner</servicePeriodFilter>
                <modifiedDateTimeFilter>
                    <fromDateTime>2012-04-01T12:00:00</fromDateTime>
                </modifiedDateTimeFilter>
            </diningResListGetRequest>
        ]]></content>
    </webRequest>
    XML;



    $rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $xml);
    $rawXML = str_replace(']]>', "", $rawXML);
    $xml = simplexml_load_string($rawXML);

    echo( $xml->saveXML() );
    If I add a line space at the start of the xml I get the same error "simplexml_load_string(): Entity: line 2: parser error : XML declaration allowed only at the" so it may be work trimming the xml.
    PHP Code:
    $xml = <<<XML

    <?xml version="1.0" encoding="utf-8"?>
    <webRequest>
        <id>160810</id>
        <request>
            <merchantShortName>ReServe</merchantShortName>
            <serviceName>reservationManagementServices</serviceName>
            <actionName>diningResListGet</actionName>
        </request>
        <authentication>
            <username>lynn</username>
            <password>lynn</password>
        </authentication>
        <content>
        <![CDATA[<?xml version="1.0" encoding="UTF-8" ?>
            <diningResListGetRequest>
                <dateRangeFilter>
                    <fromDate>2015-10-15</fromDate>
                    <toDate>2015-10-15</toDate>
                </dateRangeFilter>
                <siteNameFilter>
                    <matchCriterion>EqualTo</matchCriterion>
                    <stringValue>Frontier Vineyards</stringValue>
                </siteNameFilter>
                <maxReturned>50</maxReturned>
                <servicePeriodFilter>Dinner</servicePeriodFilter>
                <modifiedDateTimeFilter>
                    <fromDateTime>2012-04-01T12:00:00</fromDateTime>
                </modifiedDateTimeFilter>
            </diningResListGetRequest>
        ]]></content>
    </webRequest>
    XML;



    $rawXML = str_replace('<![CDATA[<?xml version="1.0" encoding="UTF-8" ?>', "", $xml);
    $rawXML = str_replace(']]>', "", $rawXML);
    $xml = simplexml_load_string($rawXML);

    echo( $xml->saveXML() );

  7. #7
    Visible Ninja bronze trophy
    JeffWalden's Avatar
    Join Date
    Sep 2002
    Location
    Los Angeles
    Posts
    1,709
    Mentioned
    5 Post(s)
    Tagged
    0 Thread(s)
    Seriously? Doing a trim() on $raw XML solved the issue. Thanks for the help!

    ::stops banging head against wall::
    TAKE A WALK OUTSIDE YOUR MIND.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •