SitePoint Sponsor

User Tag List

Page 1 of 4 1234 LastLast
Results 1 to 25 of 92

Hybrid View

  1. #1
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    simpleXML no longer reliably parses feed

    Hi

    For a couple of years the code below has been working OK parsing an external XML feed and inserting data into a database. The size of the file has grown considerably since the date on which the script was written and I find an increasing amount of errors (PHP Warning: SimpleXMLElement::__construct() [<a href='simplexmlelement.--construct'>simplexmlelement.--construct</a>]: I/O warning : failed to load external entity) logged on the server.

    I would like to ask for advice and assistance in updating what I have to hopefully improve its performance. The server that supplies the feed does seem to take a while to return any content, presumably it has to receive a request for the specific content for the user then render the XML and return it. In a browser it can take up to 10 seconds to do this even if the request is set to return just one set of results. I am guessing that this is the reason for the errors, the whole thing must time out before the feed is loaded into memory.

    Your thoughts and examples would be welcomed.

    regards

    Colin

    PHP Code:
    <?php

    ini_set
    ('display_errors'1);
    ini_set('log_errors'1);
    ini_set('error_log'dirname(__FILE__) . '/error_log.txt');
    error_reporting(-1);

    include (
    'conn.php'); // Database connection AND URL for feed



    try{
      
    $feed = new SimpleXMLElement($xmlnulltrue);
    }catch(
    Exception $e){
      echo 
    $e->getMessage();
      exit;
    }

    $sql 'INSERT INTO property (`id`, `date`, `ref`, `price`, `currency`, `price_freq`, `part_ownership`, `leasehold`, `type`, `country`, `town`, `province`, `beds`, `baths`, `pool`, `desc`) VALUES ';

    foreach(
    $feed->property as $property){
      
    $sql .= sprintf(
        
    //"\n(%d, '%s', '%s', %d),",
        
    "\n(%d, '%s', '%s', '%d', '%s', '%s', '%d', '%d', '%s', '%s', '%s', '%s', '%d', '%d', '%d', '%s'),",
       
    $property->id,
       
    mysql_real_escape_string($property->date),
       
    mysql_real_escape_string($property->ref),
       
    $property->price,
       
    mysql_real_escape_string($property->currency),
       
    mysql_real_escape_string($property->price_freq),
       
    $property->part_ownership,
       
    $property->leasehold,
       
    mysql_real_escape_string($property->type->en),
       
    mysql_real_escape_string($property->country),
       
    mysql_real_escape_string($property->town),
       
    mysql_real_escape_string($property->province),
       
    $property->beds,
       
    $property->baths,
       
    $property->pool,
       
    mysql_real_escape_string($property->desc->en)
      );
    }

    $sql rtrim($sql',') . ';';

    echo 
    $sql;

    if(!
    mysql_query($sql)){
      echo 
    '<h1 style="color: red;">Error</h1><p>'mysql_error(), '<p>';
    }
    else
    {
    echo 
    "Records have been inserted into the database";
    }
    ?>

  2. #2
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Hi Colin,

    PHP's XMLReader extension is probably what you want. It lets you parse XML without having to load the whole file into memory. You might also find this sitepoint thread helpful reading.

  3. #3
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks for the reply.

    I did actually read the thread you suggested, interestingly it was Anthony Sterling who helped me put together the existing code.

    Can I combine XMLReader and simpleXML I wonder? The latter seems so much easier to work with.

  4. #4
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Quote Originally Posted by ColinHughes View Post
    Can I combine XMLReader and simpleXML I wonder? The latter seems so much easier to work with.
    That seems to be what they're doing in the last code examples in the thread - using XMLReader to navigate the whole file, and SimpleXML to load in small sections to process. What's the structure of the XML file you're working with? Is it deeply nested, or the size is just down to the sheer number of records?

  5. #5
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I had scrolled down to the next example I would have noticed that I will take a look at it and see if it can be of use. Thanks.

  6. #6
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    My code so far

    OK so I now have the following which is working. Appreciate a second or even third pair of eyes to check it over though

    PHP Code:
    <?php
    include ('conn.php'); // Connect to the database

    $url 'http://localhost/xml-reader/test.xml'// URL for feed.

    $reader = new XMLReader();
    $reader->open($url);



                                        
    ////////////////////////////////////////   INSERT PROPERTY DETAILS  //////////////////////////////////////////////////////////

    $sql "INSERT INTO feed_property (`propertyref`, `property_type`, `status`, `listed_date`, `ref`) VALUES ";


    while(
    $reader->read()) {
        if(
    $reader->nodeType == XMLReader::ELEMENT and $reader->name == 'property') {
            
    $doc = new DOMDocument('1.0','UTF-8');
            
    $property simplexml_import_dom($doc->importNode($reader->expand(), true));
            
            
           
    $sql .= sprintf(
        
    "\n('%s', '%s', '%s', '%s', '%s')\n,",
          
    $db->real_escape_string($property->id),
          
    $db->real_escape_string($property->type->uk),
          
    $db->real_escape_string($property->status),
          
    $db->real_escape_string($property->listed_date),
          
    $db->real_escape_string($property->ref)
         
      );

      }    
    }
    $sql rtrim($sql',') .  'ON DUPLICATE KEY UPDATE property_type = VALUES (property_type), status = VALUES (status), listed_date = VALUES (listed_date), ref = VALUES (ref)';

    echo 
    $sql;


      if(!
    $result $db->query($sql)){
        die(
    '<h1 style="color: red;">Error</h1><p>There was an error running the query [' $db->error ']</p>');
    }

    else
    {

      echo 
    '<h1 style="color: red;">Property details successfully added/updated!</h1>';
      echo 
    '<h1 style="color: red;">Total Rows Affected: ' $db->affected_rows '</h1>';
    }

    ?>
    Next problem is I need to add a further 2 Inserts to the DB from the feed. Currently these take the form of:

    PHP Code:
    $sql "INSERT INTO feed_characteristics (`propertyref`, `category`, `value`, `uid` ) VALUES ";
    $sqlxml "";
    $sqlxmlarr = array();



    foreach(
    $xml->property as $property) {

     
    $propertyid = (string)$property->id;
     
    $number 1;
     foreach(
    $property->characteristics->category as $category) {

       
      
      
    $cat = (string)$category->name->uk;
      
    $sqlxml "('".$propertyid."','".$cat."',";
      


       
    $i 0;
       foreach(
    $category->value as $value) {
       
    $uid "$propertyid-$number"
       
    $val = (string)$value->uk;
       
    $sqlxmlarr[] = $sqlxml."'".$val."','".$uid."') , ";
       
    $i++;
       
    $number++;
      }
      
     
       
     }
      
    }

    foreach(
    $sqlxmlarr as $sqlchk$sql .= $sqlchk;


    $sql rtrim($sql' , ') .  ' ON DUPLICATE KEY UPDATE category = VALUES (category), value = VALUES (value)'
    AND

    PHP Code:
    $sql 'INSERT INTO feed_images (`propertyref`, `url`, `prime`, `pid`) VALUES ';

    $sqlxmlarr = array();

    foreach(
    $xml->property as $property) {

     
    $propertyid = (string)$property->id;
     
     
      
      
    $i 0;
    foreach (
    $property->images->image as $image) {
      
       
    $url = (string)$image->url;
       
    $prime = (int)$image->primary;
       
    $imageid = (int)$image['id'];
       
    $pid "$propertyid-$imageid";
       
    $sqlxmlarr[] = "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , ";
       
    $i++;
      
      }
      
     

    }

    foreach(
    $sqlxmlarr as $sqlchk$sql .= $sqlchk;
     

    $sql rtrim($sql' , ') .  ' ON DUPLICATE KEY UPDATE url = VALUES (url), prime = VALUES (prime)'
    Plus there is a delete statement added onto the end of all that. I hope you can appreciate why I wanted to speed this whole thing up now! I really need some guidance to turn this whole thing into an efficient script.

    TIA

    Colin

  7. #7
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Hi Colin,

    I think I'd be tempted to move the DB queries to separate functions to keep the main loop tidy and easy to read:
    PHP Code:
    include ('conn.php'); // Connect to the database
    include ('functions.php');

    $url 'http://localhost/xml-reader/test.xml'// URL for feed.

    $reader = new XMLReader();
    $reader->open($url);

    ////////////////////////////////////////   INSERT PROPERTY DETAILS  //////////////////////////////////////////////////////////

    while($reader->read()) {
        if(
    $reader->nodeType == XMLReader::ELEMENT and $reader->name == 'property') {
            
    $doc = new DOMDocument('1.0','UTF-8');
            
    $property simplexml_import_dom($doc->importNode($reader->expand(), true));
            
            if( !
    add_property($db$property) ) {
                die(
    '<h1 style="color: red;">Error</h1><p>There was an error running the query [' $db->error ']</p>');
            }
            if( !
    add_characteristics($db$property) ) {
                die(
    '<h1 style="color: red;">Error</h1><p>There was an error running the query [' $db->error ']</p>');
            }
            if( !
    add_images($db$property) {
                die(
    '<h1 style="color: red;">Error</h1><p>There was an error running the query [' $db->error ']</p>');
            }
        }    
    }

    echo 
    '<h1 style="color: red;">Property details successfully added/updated!</h1>';
    echo 
    '<h1 style="color: red;">Total Rows Affected: ' $db->affected_rows '</h1>'
    The functions are pretty much the same as the code you provided, but it's possible to remove some unused loops and variables to make them a little more readable:
    PHP Code:
    function add_property($db$property){
        
    $sql "INSERT INTO feed_property (`propertyref`, `property_type`, `status`, `listed_date`, `ref`) VALUES ";
        
        
    $sql .= sprintf("\n('%s', '%s', '%s', '%s', '%s')\n,",
            
    $db->real_escape_string($property->id),
            
    $db->real_escape_string($property->type->uk),
            
    $db->real_escape_string($property->status),
            
    $db->real_escape_string($property->listed_date),
            
    $db->real_escape_string($property->ref)
        );
        
        
    $sql rtrim($sql',') .  'ON DUPLICATE KEY UPDATE property_type = VALUES (property_type), status = VALUES (status), listed_date = VALUES (listed_date), ref = VALUES (ref)';
        
        return 
    $db->query($sql)

    PHP Code:
    function add_characteristics($db$property){
        
    $sql "INSERT INTO feed_characteristics (`propertyref`, `category`, `value`, `uid` ) VALUES ";

        
    $propertyid = (string)$property->id;
        
    $number 1;
        foreach(
    $property->characteristics->category as $category)
        {
            
    $cat = (string)$category->name->uk;
            
    $sqlxml "('".$propertyid."','".$cat."',";

            foreach(
    $category->value as $value)
            {
                
    $uid "$propertyid-$number"
                
    $val = (string)$value->uk;
                
                
    $sql .= $sqlxml."'".$val."','".$uid."') , ";
                
    $number++;
            }
        }
        
    $sql rtrim($sql' , ') .  ' ON DUPLICATE KEY UPDATE category = VALUES (category), value = VALUES (value)';
        
        return 
    $db->query($sql);

    PHP Code:
    function add_images($db$property) {
        
    $sql 'INSERT INTO feed_images (`propertyref`, `url`, `prime`, `pid`) VALUES ';

        
    $property_id = (string)$property->id;
        foreach (
    $property->images->image as $image)
        {
            
    $url     = (string)$image->url;
            
    $prime   = (int)$image->primary;
            
    $imageid = (int)$image['id'];
            
    $pid     "$property_id-$imageid";
            
            
    $sql .= "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , ";
        }
        
    $sql rtrim($sql' , ') .  ' ON DUPLICATE KEY UPDATE url = VALUES (url), prime = VALUES (prime)';
        
        return 
    $db->query($sql);


  8. #8
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks Fretburner I will take a look at this and report back. Hang in there please, I appreciate your help.

    Colin

  9. #9
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry but this is not working. The loop through the property nodes does not work.

  10. #10
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Quote Originally Posted by ColinHughes View Post
    Sorry but this is not working. The loop through the property nodes does not work.
    Hi Colin, do you have any more details? What exactly is the error?

  11. #11
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK I have an error with the images.

    There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '(' ', 'http://someimages.com/image' at line 1]

    The property id (just shows as ' ') is not being inserted into the query, so I think the error is because the on duplicate update fails.

    query: (' ', 'http://someimages.com/image.asp?Id=X1000227', '0', '4491-2') Not sure why the propertyid is not being inserted because it forms the first part of the PID '4491-2' and that is clearly working?

  12. #12
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Sorry, my fault.. it's down to a typo in the code I posted. In the add_images function
    PHP Code:
    $sql .= "('".$propertyid." ', '".$url."', '".$prime."', '".$pid."') , "
    should be
    PHP Code:
    $sql .= "('".$property_id." ', '".$url."', '".$prime."', '".$pid."') , "

  13. #13
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I spent ages looking through this and even when you showed me it I still had to look hard to see the missing underscore LOL. I still get the error though?

    There was an error adding images [You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near '('4491', 'http://someimages.com/image' at line 1]

  14. #14
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I echo out the sql it just returns 1 property instead of multiples.

  15. #15
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Quote Originally Posted by ColinHughes View Post
    If I echo out the sql it just returns 1 property instead of multiples.
    Sure, because you're looping through the XML file and pulling out one property to process at a time. Are you saying that you want to batch up the database queries to execute them together?

  16. #16
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK solved that one the ON Duplicate was inside the loop.

  17. #17
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    All seems to work perfectly now, thank you so much.

    I understand what you say about only running the query on the last property only.

    This will be run from a cronjob so outputting errors on the page is not really needed.

    I just need to get it on the server and see if it will perform well without timing out as per the old version.

    Thanks again

    Colin

  18. #18
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Final working code:
    PHP Code:
    function remove_property($db$property) {
        
    $sql "SELECT `propertyref` FROM `feed_property` WHERE `status` = 'Sold' OR `status` = 'Off Market' ";
        
    $result $db->query($sql);

        if (isset(
    $result)):
            while (
    $row $result->fetch_array(MYSQLI_ASSOC)):
                
    $ref1 $db->real_escape_string($row['propertyref']);

                
    $query1 "DELETE FROM `feed_property`, `feed_images`, `feed_characteristics`\n"
                
    "USING `feed_property` INNER JOIN `feed_images` INNER JOIN `feed_characteristics`\n"
                
    "WHERE feed_property.propertyref = '$ref1'\n"
                
    " AND feed_images.propertyref = feed_property.propertyref\n"
                
    " AND feed_characteristics.propertyref = feed_property.propertyref; ";
                
                
    $db->query($query1);

            endwhile;
        endif;
        
        return 
    $db->query($sql);


  19. #19
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Sounds good. Just a couple last things: You don't need to pass $property into the function, as it's not used. And secondly, there's no point in that last query in the return statement - it's just re-executing the select statement. It's not actually necessary to return anything at all from a function.

    I'd be interested to hear if the new code processes your XML file OK.

  20. #20
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If I use the code as given it does not delete properties??

  21. #21
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Without that return $db->query($sql); it does not delete??

  22. #22
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Quote Originally Posted by ColinHughes View Post
    Without that return $db->query($sql); it does not delete??
    How are you calling the function from your main loop? And does it still work if you
    PHP Code:
    return TRUE
    ?

  23. #23
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    return TRUE
    I had just tried that successfully when you posted. All working now so I will load it up and try it live. Thanks.

  24. #24
    SitePoint Evangelist ColinHughes's Avatar
    Join Date
    Sep 2004
    Location
    Spain
    Posts
    473
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well it works just fine locally but as soon as it goes up on the server with a real URL for the feed this is the result:

    [10-Jul-2013 00:32:53] PHP Warning: XMLReader:pen(http://...) [<a href='xmlreader.open'>xmlreader.open</a>]: failed to open stream: HTTP request failed! HTTP/1.1 500 Internal Server Error in /home/solvilla/public_html/admin/feed_scripts/sales/insert_feeds.php on line 14
    [10-Jul-2013 00:32:53] PHP Warning: XMLReader:pen() [<a href='xmlreader.open'>xmlreader.open</a>]: Unable to open source data in /feed_scripts/sales/insert_feeds.php on line 14
    [10-Jul-2013 00:32:53] PHP Warning: XMLReader::read() [<a href='xmlreader.read'>xmlreader.read</a>]: Load Data before trying to read in /feed_scripts/sales/insert_feeds.php on line 17

    AARGH!!!

  25. #25
    Community Advisor bronze trophy
    fretburner's Avatar
    Join Date
    Apr 2013
    Location
    Brazil
    Posts
    1,438
    Mentioned
    45 Post(s)
    Tagged
    13 Thread(s)
    Morning Colin, ready for round 2?

    This sounds like a problem with the path to your XML file. It might even be something as simple as the path still being set for your localhost test environment - I've done that many times before myself!


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •