I have a 1+GB XML file that I need to parse into a database. The XML file has many nested elements - I’ve posted a sample of the XML file at http://pastebin.com/7Wzzaxg1 (but this XML file continues for hundres of thousands of rows).
What I need to do is to export all of this data from the XML file into a database. I’m trying to figure out how to process segments to export - for example, on one run-through, I’ll want to extract the child elements of Album (id, name, sample url, upc, artist ID, label id, category id). On another run-through, I’ll want to grab all Artist data (id, name, url). On yet another run-through, I’ll need the “data” element in addition to the album ID to which it belongs (Album ID, Data ID, Data Name, Data Sample URL).
Unfortunately, since this file is so huge, I’m unable to use SimpleXML parsing - I’m forced to use XMLReader (which streams an input file by default) or xml_parse/fopen (such as in this example http://www.ustrem.org/en/articles/large-xml-files-in-php-en/ ). I can’t seem to figure how to handle these nested elements easily, though. For example - since there are a number of nodes called “name” on different levels, I’m often returning too many results or incorrect tags.
Does anyone have any suggestions on how to handle this parsing?