simpleXML no longer reliably parses feed

Hi

For a couple of years the code below has been working OK parsing an external XML feed and inserting data into a database. The size of the file has grown considerably since the date on which the script was written and I find an increasing amount of errors (PHP Warning: SimpleXMLElement::__construct() [<a href=‘simplexmlelement.–construct’>simplexmlelement.–construct</a>]: I/O warning : failed to load external entity) logged on the server.

I would like to ask for advice and assistance in updating what I have to hopefully improve its performance. The server that supplies the feed does seem to take a while to return any content, presumably it has to receive a request for the specific content for the user then render the XML and return it. In a browser it can take up to 10 seconds to do this even if the request is set to return just one set of results. I am guessing that this is the reason for the errors, the whole thing must time out before the feed is loaded into memory.

Your thoughts and examples would be welcomed.

regards

Colin

&lt;?php

ini_set('display_errors', 1);
ini_set('log_errors', 1);
ini_set('error_log', dirname(__FILE__) . '/error_log.txt');
error_reporting(-1);

include ('conn.php'); // Database connection AND URL for feed



try{
  $feed = new SimpleXMLElement($xml, null, true);
}catch(Exception $e){
  echo $e-&gt;getMessage();
  exit;
}

$sql = 'INSERT INTO property (`id`, `date`, `ref`, `price`, `currency`, `price_freq`, `part_ownership`, `leasehold`, `type`, `country`, `town`, `province`, `beds`, `baths`, `pool`, `desc`) VALUES ';

foreach($feed-&gt;property as $property){
  $sql .= sprintf(
    //"\
(%d, '%s', '%s', %d),",
    "\
(%d, '%s', '%s', '%d', '%s', '%s', '%d', '%d', '%s', '%s', '%s', '%s', '%d', '%d', '%d', '%s'),",
   $property-&gt;id,
   mysql_real_escape_string($property-&gt;date),
   mysql_real_escape_string($property-&gt;ref),
   $property-&gt;price,
   mysql_real_escape_string($property-&gt;currency),
   mysql_real_escape_string($property-&gt;price_freq),
   $property-&gt;part_ownership,
   $property-&gt;leasehold,
   mysql_real_escape_string($property-&gt;type-&gt;en),
   mysql_real_escape_string($property-&gt;country),
   mysql_real_escape_string($property-&gt;town),
   mysql_real_escape_string($property-&gt;province),
   $property-&gt;beds,
   $property-&gt;baths,
   $property-&gt;pool,
   mysql_real_escape_string($property-&gt;desc-&gt;en)
  );
}

$sql = rtrim($sql, ',') . ';';

echo $sql;

if(!mysql_query($sql)){
  echo '&lt;h1 style="color: red;"&gt;Error&lt;/h1&gt;&lt;p&gt;', mysql_error(), '&lt;p&gt;';
}
else
{
echo "Records have been inserted into the database";
}
?&gt;

Hi Colin,

PHP’s XMLReader extension is probably what you want. It lets you parse XML without having to load the whole file into memory. You might also find this [URL=“http://www.sitepoint.com/forums/showthread.php?720925-Parsing-a-large-(1-GB)-nested-XML-file”]sitepoint thread helpful reading.

Thanks for the reply.

I did actually read the thread you suggested, interestingly it was Anthony Sterling who helped me put together the existing code.

Can I combine XMLReader and simpleXML I wonder? The latter seems so much easier to work with.

That seems to be what they’re doing in the last code examples in the thread - using XMLReader to navigate the whole file, and SimpleXML to load in small sections to process. What’s the structure of the XML file you’re working with? Is it deeply nested, or the size is just down to the sheer number of records?

If I had scrolled down to the next example I would have noticed that :blush: I will take a look at it and see if it can be of use. Thanks.

OK so I now have the following which is working. Appreciate a second or even third pair of eyes to check it over though :slight_smile:

&lt;?php
include ('conn.php'); // Connect to the database

$url = 'http://localhost/xml-reader/test.xml'; // URL for feed.

$reader = new XMLReader();
$reader-&gt;open($url);



                                    ////////////////////////////////////////   INSERT PROPERTY DETAILS  //////////////////////////////////////////////////////////

$sql = "INSERT INTO feed_property (`propertyref`, `property_type`, `status`, `listed_date`, `ref`) VALUES ";


while($reader-&gt;read()) {
    if($reader-&gt;nodeType == XMLReader::ELEMENT and $reader-&gt;name == 'property') {
        $doc = new DOMDocument('1.0','UTF-8');
        $property = simplexml_import_dom($doc-&gt;importNode($reader-&gt;expand(), true));
        
        
       $sql .= sprintf(
    "\
('%s', '%s', '%s', '%s', '%s')\
,",
      $db-&gt;real_escape_string($property-&gt;id),
      $db-&gt;real_escape_string($property-&gt;type-&gt;uk),
      $db-&gt;real_escape_string($property-&gt;status),
      $db-&gt;real_escape_string($property-&gt;listed_date),
      $db-&gt;real_escape_string($property-&gt;ref)
     
  );

  }    
}
$sql = rtrim($sql, ',') .  'ON DUPLICATE KEY UPDATE property_type = VALUES (property_type), status = VALUES (status), listed_date = VALUES (listed_date), ref = VALUES (ref)';

echo $sql;


  if(!$result = $db-&gt;query($sql)){
    die('&lt;h1 style="color: red;"&gt;Error&lt;/h1&gt;&lt;p&gt;There was an error running the query [' . $db-&gt;error . ']&lt;/p&gt;');
}

else
{

  echo '&lt;h1 style="color: red;"&gt;Property details successfully added/updated!&lt;/h1&gt;';
  echo '&lt;h1 style="color: red;"&gt;Total Rows Affected: ' . $db-&gt;affected_rows . '&lt;/h1&gt;';
}

?&gt;

Next problem is I need to add a further 2 Inserts to the DB from the feed. Currently these take the form of:

$sql = "INSERT INTO feed_characteristics (`propertyref`, `category`, `value`, `uid` ) VALUES ";
$sqlxml = "";
$sqlxmlarr = array();



foreach($xml-&gt;property as $property) {

 $propertyid = (string)$property-&gt;id;
 $number = 1;
 foreach($property-&gt;characteristics-&gt;category as $category) {

   
  
  $cat = (string)$category-&gt;name-&gt;uk;
  $sqlxml = "('".$propertyid."','".$cat."',";
  


   $i = 0;
   foreach($category-&gt;value as $value) {
   $uid = "$propertyid-$number"; 
   $val = (string)$value-&gt;uk;
   $sqlxmlarr[] = $sqlxml."'".$val."','".$uid."') , ";
   $i++;
   $number++;
  }
  
 
   
 }
  
}

foreach($sqlxmlarr as $sqlchk) $sql .= $sqlchk;


$sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE category = VALUES (category), value = VALUES (value)'; 


AND

$sql = 'INSERT INTO feed_images (`propertyref`, `url`, `prime`, `pid`) VALUES ';

$sqlxmlarr = array();

foreach($xml-&gt;property as $property) {

 $propertyid = (string)$property-&gt;id;
 
 
  
  $i = 0;
foreach ($property-&gt;images-&gt;image as $image) {
  
   $url = (string)$image-&gt;url;
   $prime = (int)$image-&gt;primary;
   $imageid = (int)$image['id'];
   $pid = "$propertyid-$imageid";
   $sqlxmlarr[] = "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , ";
   $i++;
  
  }
  
 

}

foreach($sqlxmlarr as $sqlchk) $sql .= $sqlchk;
 

$sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE url = VALUES (url), prime = VALUES (prime)';


Plus there is a delete statement added onto the end of all that. I hope you can appreciate why I wanted to speed this whole thing up now! I really need some guidance to turn this whole thing into an efficient script.

TIA

Colin

Hi Colin,

I think I’d be tempted to move the DB queries to separate functions to keep the main loop tidy and easy to read:


include ('conn.php'); // Connect to the database
include ('functions.php');

$url = 'http://localhost/xml-reader/test.xml'; // URL for feed.

$reader = new XMLReader();
$reader->open($url);

////////////////////////////////////////   INSERT PROPERTY DETAILS  //////////////////////////////////////////////////////////

while($reader->read()) {
    if($reader->nodeType == XMLReader::ELEMENT and $reader->name == 'property') {
        $doc = new DOMDocument('1.0','UTF-8');
        $property = simplexml_import_dom($doc->importNode($reader->expand(), true));

        if( !add_property($db, $property) ) {
            die('<h1 style="color: red;">Error</h1><p>There was an error running the query [' . $db->error . ']</p>');
        }
        if( !add_characteristics($db, $property) ) {
            die('<h1 style="color: red;">Error</h1><p>There was an error running the query [' . $db->error . ']</p>');
        }
        if( !add_images($db, $property) {
            die('<h1 style="color: red;">Error</h1><p>There was an error running the query [' . $db->error . ']</p>');
        }
    }
}

echo '<h1 style="color: red;">Property details successfully added/updated!</h1>';
echo '<h1 style="color: red;">Total Rows Affected: ' . $db->affected_rows . '</h1>';

The functions are pretty much the same as the code you provided, but it’s possible to remove some unused loops and variables to make them a little more readable:


function add_property($db, $property){
    $sql = "INSERT INTO feed_property (`propertyref`, `property_type`, `status`, `listed_date`, `ref`) VALUES ";

    $sql .= sprintf("\
('%s', '%s', '%s', '%s', '%s')\
,",
        $db-&gt;real_escape_string($property-&gt;id),
        $db-&gt;real_escape_string($property-&gt;type-&gt;uk),
        $db-&gt;real_escape_string($property-&gt;status),
        $db-&gt;real_escape_string($property-&gt;listed_date),
        $db-&gt;real_escape_string($property-&gt;ref)
    );

    $sql = rtrim($sql, ',') .  'ON DUPLICATE KEY UPDATE property_type = VALUES (property_type), status = VALUES (status), listed_date = VALUES (listed_date), ref = VALUES (ref)';

    return $db-&gt;query($sql)
}


function add_characteristics($db, $property){
    $sql = "INSERT INTO feed_characteristics (`propertyref`, `category`, `value`, `uid` ) VALUES ";

    $propertyid = (string)$property-&gt;id;
    $number = 1;
    foreach($property-&gt;characteristics-&gt;category as $category)
    {
        $cat = (string)$category-&gt;name-&gt;uk;
        $sqlxml = "('".$propertyid."','".$cat."',";

        foreach($category-&gt;value as $value)
        {
            $uid = "$propertyid-$number";
            $val = (string)$value-&gt;uk;

            $sql .= $sqlxml."'".$val."','".$uid."') , ";
            $number++;
        }
    }
    $sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE category = VALUES (category), value = VALUES (value)';

    return $db-&gt;query($sql);
}


function add_images($db, $property) {
    $sql = 'INSERT INTO feed_images (`propertyref`, `url`, `prime`, `pid`) VALUES ';

    $property_id = (string)$property-&gt;id;
    foreach ($property-&gt;images-&gt;image as $image)
    {
        $url     = (string)$image-&gt;url;
        $prime   = (int)$image-&gt;primary;
        $imageid = (int)$image['id'];
        $pid     = "$property_id-$imageid";

        $sql .= "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , ";
    }
    $sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE url = VALUES (url), prime = VALUES (prime)';

    return $db-&gt;query($sql);
}

Thanks Fretburner I will take a look at this and report back. Hang in there please, I appreciate your help.

Colin

Sorry but this is not working. The loop through the property nodes does not work.

Hi Colin, do you have any more details? What exactly is the error?

If I echo out the sql it just returns 1 property instead of multiples.

Sure, because you’re looping through the XML file and pulling out one property to process at a time. Are you saying that you want to batch up the database queries to execute them together?

OK I have an error with the images.

There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near ‘(’ ', ‘http://someimages.com/image’ at line 1]

The property id (just shows as ’ ') is not being inserted into the query, so I think the error is because the on duplicate update fails.

query: (’ ', ‘http://someimages.com/image.asp?Id=X1000227’, ‘0’, ‘4491-2’) Not sure why the propertyid is not being inserted because it forms the first part of the PID ‘4491-2’ and that is clearly working?

Sorry, my fault… it’s down to a typo in the code I posted. In the add_images function


$sql .= "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , ";

should be


$sql .= "('".$property_id." ', '".$url."', '".$prime."', '".$pid."') , ";

I spent ages looking through this and even when you showed me it I still had to look hard to see the missing underscore LOL. I still get the error though?

There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(‘4491’, ‘http://someimages.com/image’ at line 1]

Is it possible to post the whole SQL statement that is causing the problem?

OK solved that one the ON Duplicate was inside the loop.

I now have this, which on quick testing seems to work.

&lt;?php


// ADD PROPERTY

function add_property($db, $property){
    $sql = "INSERT INTO feed_property (`propertyref`, `property_type`, `status`, `listed_date`, `ref`) VALUES ";
    
    $sql .= sprintf("\
('%s', '%s', '%s', '%s', '%s')\
,",
        $db-&gt;real_escape_string($property-&gt;id),
        $db-&gt;real_escape_string($property-&gt;type-&gt;uk),
        $db-&gt;real_escape_string($property-&gt;status),
        $db-&gt;real_escape_string($property-&gt;listed_date),
        $db-&gt;real_escape_string($property-&gt;ref)
    );
    
    $sql = rtrim($sql, ',') .  'ON DUPLICATE KEY UPDATE property_type = VALUES (property_type), status = VALUES (status), listed_date = VALUES (listed_date), ref = VALUES (ref)';
    
    return $db-&gt;query($sql);
}


//ADD CHARACTERISTICS

function add_characteristics($db, $property){
    $sql = "INSERT INTO feed_characteristics (`propertyref`, `category`, `value`, `uid` ) VALUES ";

    $propertyid = (string)$property-&gt;id;
    $number = 1;
    foreach($property-&gt;characteristics-&gt;category as $category):
    
        $cat = (string)$category-&gt;name-&gt;uk;
        $sqlxml = "('".$propertyid."','".$cat."',";

        foreach($category-&gt;value as $value):
        
            $uid = "$propertyid-$number"; 
            $val = (string)$value-&gt;uk;
            
            $sql .= $sqlxml."'".$val."','".$uid."') , ";
            $number++;
        endforeach;
    endforeach;
    
    $sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE category = VALUES (category), value = VALUES (value)';
   
    return $db-&gt;query($sql);
} 


//ADD IMAGES

function add_images($db, $property) {
    $sql = "INSERT INTO feed_images (`propertyref`, `url`, `prime`, `pid`) VALUES ";

    $property_id = (string)$property-&gt;id;
    foreach ($property-&gt;images-&gt;image as $image):
    
        
        $url     = (string)$image-&gt;url;
        $prime   = (int)$image-&gt;primary;
        $imageid = (int)$image['id'];
        $pid     = "$property_id-$imageid";
        
        $sql .= "('".$property_id." ', '".$url."', '".$prime."', '".$pid."') , ";
        
    endforeach;
    
    $sql = rtrim($sql, ' , ') .  ' ON DUPLICATE KEY UPDATE url = VALUES (url), prime = VALUES (prime)';

  

    
    return $db-&gt;query($sql);
}



//DELETE SOLD PROPERTY 

function remove_property($db, $property) {
    $sql = "SELECT `propertyref` FROM `feed_property` WHERE `status` = 'Sold' OR `status` = 'Off Market' ";
    $result = mysqli_query($db, $sql);
    
    

if (isset($result)):
    while ($row = $result-&gt;fetch_array(MYSQLI_ASSOC)):
        $ref1 = $db-&gt;real_escape_string($row['propertyref']);


        $query1 = "DELETE FROM `feed_property`, `feed_images`, `feed_characteristics`\
"
            . "USING `feed_property` INNER JOIN `feed_images` INNER JOIN `feed_characteristics`\
"
            . "WHERE feed_property.propertyref = '$ref1'\
"
            . " AND feed_images.propertyref =  '$ref1'\
"
            . " AND feed_characteristics.propertyref =  '$ref1'; ";
    
   
endwhile;
endif;

 return $db-&gt;query($sql);
 
}

?&gt; 

Thanks to fretburner for the guidance. I will test this later to see if it performs faster than the old simplexml only parser. If you see a mistake of feel that you could improve on this then please chip in. I need this to run as fast and as efficiently as possible.

Thanks

Almost there but I have a problem with deleting sold properties. Code below:

//DELETE SOLD PROPERTY 

function remove_property($db, $property) {
    $sql = "SELECT `propertyref` FROM `feed_property` WHERE `status` = 'Sold' OR `status` = 'Off Market' ";
    $result = mysqli_query($db, $sql);
    
    

if (isset($result)):
    while ($row = $result-&gt;fetch_array(MYSQLI_ASSOC)):
        $ref1 = $db-&gt;real_escape_string($row['propertyref']);


            $query1 = "DELETE FROM `feed_property`, `feed_images`, `feed_characteristics`\
"
            . "USING `feed_property` INNER JOIN `feed_images` INNER JOIN `feed_characteristics`\
"
            . "WHERE feed_property.propertyref = '$ref1'\
"
            . " AND feed_images.propertyref = feed_property.propertyref\
"
            . " AND feed_characteristics.propertyref = feed_property.propertyref; ";
    
   
endwhile;
endif;

 return $db-&gt;query($sql);
 
}

No errors it just does not delete?

Thanks

Colin

The main problem is that you’re looping over all the properties that need deleting, but you’re only actually executing the last query as your $db->query() call is outside of the loop.

Also, I’m not sure if your first query will work correctly, as you’re calling the mysqli_query but passing in $db which is a mysqli object. It would probably be better to be consistent and stick with the OO interface:

$result = $db->query($sql);

The whole function might look something like this:


function remove_property($db) {
    $sql = "SELECT `propertyref` FROM `feed_property` WHERE `status` = 'Sold' OR `status` = 'Off Market' ";
    $result = $db-&gt;query($sql);

    if (isset($result)):
        while ($row = $result-&gt;fetch_array(MYSQLI_ASSOC)):
            $ref1 = $db-&gt;real_escape_string($row['propertyref']);

            $query1 = "DELETE FROM `feed_property`, `feed_images`, `feed_characteristics`\
"
            . "USING `feed_property` INNER JOIN `feed_images` INNER JOIN `feed_characteristics`\
"
            . "WHERE feed_property.propertyref = '$ref1'\
"
            . " AND feed_images.propertyref = feed_property.propertyref\
"
            . " AND feed_characteristics.propertyref = feed_property.propertyref; ";

            $db-&gt;query($query1);

        endwhile;
    endif;
}

It’s a little more complicated to handle query errors, as you’re executing more than one query. You could always return an array of any errors from the function so you can log them or output them to the screen from your main script;