SimpleXMLElement's capacity

PHPycho · October 4, 2010, 4:43am

Dear All

I would like to know is there any limitation in the size of xml that SimpleXMLElement can handle effectively.

Can it handle xml of size 50MB?
If not which xml parse should i use?

Thanks in advance for any suggestions.

PHPycho · October 5, 2010, 11:45am

Great Anthony!!
That works

But i have one problem. Since i have thousands of data. Populating as an objects first and looping thereafter may consume time.

It would be more effective if we can operate on the nodes during reading using XMLReader.

Any idea?

Really appreciate your help.

Thanks

PHPycho · October 5, 2010, 10:55am

Hi Anthony
I tried to use your code. But it’s showing double empty values.
Any idea??

AnthonySterling · October 5, 2010, 8:16am

Ooo, this looks like fun. I’ll fire up the IDE.

Saying that, I reckon Salathe will beat me to it.

Mittineague · October 4, 2010, 6:25pm

The XML snippet looks OK to me

<?xml version="1.0" encoding="UTF-8"?>
<bronboek_basic_xml docformat="1.0">
<catalog>
    <product>
		<isbn>0.25</isbn>
		<auteur>KAART</auteur>
		<titel>DIV.  KAARTEN</titel>
		<levcode>SABRA</levcode>
		<editie></editie>
		<pagina></pagina>
		<nur>0</nur>
		<gewicht>0</gewicht>
		<prijs>0,25</prijs>
		<adatum>13-3-2003</adatum>
		<eenheid></eenheid>
		<catsoort>2</catsoort>
		<boeksoort>10</boeksoort>
		<berichtcode>0</berichtcode>
		<bindcode>0</bindcode>
		<btwcode>2</btwcode>
		<btwmode>1</btwmode>
	</product>
......................

True, 700 lines is more than a few, but with some determination looking at them in a syntax highlighted editor you should spot the problem. eg.

<?xml version="1.0" encoding="UTF-8"?>
<bronboek_basic_xml docformat="1.0">
<catalog>
    <product>
		<isbn>0.25</isbn>
		<auteur><![CDATA[KAART & KAART]]></auteur>
		<titel>DIV.  KAARTEN & KAARTEN</titel>
		<levcode>SABRA</levcode>
		<editie></editie>
......................

^ I’m wondering if something needs to be inside CDATA ??

PHPycho · October 5, 2010, 7:54am

I think i should go for XMLReader.
Can anybody help me to read the following xml with XMLReader:

<?xml version="1.0" encoding="UTF-8"?>
<bronboek_basic_xml docformat="1.0">
<catalog>
    <product>
        <isbn>0.25</isbn>
        <auteur>KAART</auteur>
        <titel>DIV.  KAARTEN</titel>
        <levcode>SABRA</levcode>
        <editie></editie>
        <pagina></pagina>
        <nur>0</nur>
        <gewicht>0</gewicht>
        <prijs>0,25</prijs>
        <adatum>13-3-2003</adatum>
        <eenheid></eenheid>
        <catsoort>2</catsoort>
        <boeksoort>10</boeksoort>
        <berichtcode>0</berichtcode>
        <bindcode>0</bindcode>
        <btwcode>2</btwcode>
        <btwmode>1</btwmode>
    </product>
......................
</catalog>
</bronboek_basic_xml>

I want to read isbn, auteur, titel etc of product node.

I found it a bit difficult with XMLReader.

I have tried the following approach:

$xml_reader = new XMLReader();
$xml_reader->XML($xml_string);
while($xml_reader->read()){
  
    if($xml_reader->name == "catalog" && $xml_reader->nodeType == XMLReader::ELEMENT){
       
        while($xml_reader->read()){
            echo $xml_reader->name . '<br />';
        }        
    }    

}

Can anybody suggest the proper way of getting product node values?

Thanks

Mittineague · October 4, 2010, 7:31am

AFAIK SimpleXML loads the entire XML into memory (tree rather than event).

So if your ini settings for memory are too low it won’t work, although I would guess you would get a memory error. But I don’t know, I never tried with a large XML file.

That error message suggests the XML isn’t well formed, but I suppose that could happen if the memory shut down and truncated the file prematurely.

Can you throw more memory at it?

With a file that size, IMHO if you don’t need to work with the DOM a SAX parser would be better.

PHPycho · October 5, 2010, 9:24am

I didn’t get your point Anthony

Can you suggest me how to read above XML using XMLReader?

any help is much appreciated.

Thanks

EDIT:
Thanks for the code. I will try it and let you know.

AnthonySterling · October 5, 2010, 9:23am

Strange.

So far, I have…


<?php
error_reporting(-1);
ini_set('display_errors', true);

function load_xml($file){
  $reader = new XMLReader();
  $reader->open($file);
  return $reader;
}

$document = load_xml('products.xml');

while($document->read()){
  if('product' === $document->name && $document->nodeType === XMLReader::ELEMENT){
    while($document->read()){
      if('product' === $document->name && $document->nodeType === XMLReader::END_ELEMENT){
        break;
      }
      printf("&#37;s = %s\
", $document->name, $document->value);
      /*
        isbn = 
        #text = 0.25
        isbn = 
        #text = 
        
        auteur = 
        #text = KAART
        auteur = 
        #text = 
      */
    }
  }
}

?>

As you can see, it’s giving me repeating elements, and I cannot see why. I’m going to grab a coffee and come back to it in 5 minutes.

AnthonySterling · October 4, 2010, 9:23am

Nope. It supports namespaces just fine, at least the reading of them anyway.

PHPycho · October 4, 2010, 9:22am

Here it goes the sample:

<?xml version="1.0" encoding="UTF-8"?>
<bronboek_basic_xml docformat="1.0">
<catalog>
    <product><isbn>0.25</isbn><auteur>KAART</auteur><titel>DIV.  KAARTEN</titel><levcode>SABRA</levcode><editie></editie><pagina></pagina><nur>0</nur><gewicht>0</gewicht><prijs>0,25</prijs><adatum>13-3-2003</adatum><eenheid></eenheid><catsoort>2</catsoort><boeksoort>10</boeksoort><berichtcode>0</berichtcode><bindcode>0</bindcode><btwcode>2</btwcode><btwmode>1</btwmode></product>
......................

Thanks

Mittineague · October 4, 2010, 8:56am

Can you post or attach a small portion of the XML file?

I’m too tired to remember. Doesn’t SimpleXML choke on namespaces?

PHPycho · October 4, 2010, 8:46am

More research:
I used the following code:

  libxml_use_internal_errors(true);
           $library    = simplexml_load_string($large_xml_string);
           if (!$library) {
                echo "Failed loading XML<br />";
                foreach(libxml_get_errors() as $error) {
                    echo $error->message . '<br />';
                }
            }

And got the following errors:

Failed loading XML
StartTag: invalid element name
StartTag: invalid element name
error parsing attribute name
attributes construct error
Couldn’t find end of Start Tag Eagle line 700
Input is not proper UTF-8, indicate encoding ! Bytes: 0x89 0x6E 0x73 0x3C

Hope this helps you to jott somethings.

PHPycho · October 4, 2010, 7:59am

Thanks all for the great responses.

I also tried to add the following code at the top:
set_time_limit(0);
ini_set(‘memory_limit’, ‘555555M’);

but still the same error.

May be this is due to limitation in simple xml parsing model.
May be i should look at XMLReader once.

Thanks

AnthonySterling · October 4, 2010, 7:41am

On that note, there’s always XMLReader.

Mittineague · October 6, 2010, 4:41am

Going by the code snippet in post #10, each <product> is on a single line.

Hence

Couldn’t find end of Start Tag Eagle line 700

And the fact that XMLReader is choking at 695 suggest that you still have an XML error in that area of the file.

Double check that area again or post it here if you can’t see anything obviously wrong with it.

PHPycho · October 6, 2010, 4:24am

Thanks a lot AnthonySterling.
I got the XMLReader working.
But…

It was only able to read 695 products only, though we have around 50,000 products.
What can be the cause, is there any flag to set for large xml in XMLReader?

Thanks

AnthonySterling · October 5, 2010, 11:22am

It works, but I bloody hate it.


<?php
error_reporting(-1);
ini_set('display_errors', true);

function load_xml($file){
  $reader = new XMLReader();
  $reader->open($file);
  return $reader;
}

$document = load_xml('products.xml');

$products = array();

while($document->read()){
  if('product' === $document->name && $document->nodeType === XMLReader::ELEMENT){
    $product = new stdClass;
    while($document->read()){
      if('product' === $document->name && $document->nodeType === XMLReader::END_ELEMENT){
        array_push($products, $product);
        break;
      }
      switch($document->nodeType){
        case XMLReader::ELEMENT:
          $property = $document->name;
          $product->{$property} = '';
        break;
        case XMLReader::TEXT:
          if(null !== $property){
            $product->{$property} = $document->value;
            $property = null;
          }
        break;
      }
    }
  }
}

print_r(
  $products
);

/*
  Array
  (
      [0] => stdClass Object
          (
              [isbn] => 0.25
              [auteur] => KAART
              [titel] => DIV.  KAARTEN
              [levcode] => SABRA
              [editie] => 
              [pagina] => 
              [nur] => 0
              [gewicht] => 0
              [prijs] => 0,25
              [adatum] => 13-3-2003
              [eenheid] => 
              [catsoort] => 2
              [boeksoort] => 10
              [berichtcode] => 0
              [bindcode] => 0
              [btwcode] => 2
              [btwmode] => 1
          )
      [1] => stdClass Object
          (
              [isbn] => 0.25
              [auteur] => KAART
              [titel] => DIV.  KAARTEN
              [levcode] => SABRA
              [editie] => 
              [pagina] => 
              [nur] => 0
              [gewicht] => 0
              [prijs] => 0,25
              [adatum] => 13-3-2003
              [eenheid] => 
              [catsoort] => 2
              [boeksoort] => 10
              [berichtcode] => 0
              [bindcode] => 0
              [btwcode] => 2
              [btwmode] => 1
          )
  )
*/
?>

salathe · October 4, 2010, 11:48am

Some of those errors are pretty clear. You could try and fix those.

PHPycho · October 5, 2010, 11:49am

I think we should operate on the following code:

if('product' === $document->name && $document->nodeType === XMLReader::END_ELEMENT){
        print_r($product);
        break;
      }

am i rite?

Topic		Replies	Views
Xml over 100mb PHP	1	236	March 29, 2010
Parsing a large (1+GB) nested XML file PHP	11	21242	December 29, 2010
simpleXML no longer reliably parses feed PHP	92	7087	October 8, 2014
Accessing XML namespace elements via simplexml_import_dom and XMLReader PHP	1	1117	October 8, 2014
Read node from xml file PHP xml	11	8428	July 24, 2015

SimpleXMLElement's capacity

Related topics