Help with simpleXML

Hello, I’m completely new to XML. Been reading up on simpleXML and am having trouble parsing a file because the elements I need seem to be at the same level.

here’s a snippet:

<dict>
	<key>entries</key>
	<array>

		<dict>
			<key>source</key>
			<string>Add Book…</string>
			<key>target</key>
			<string>Añadir libro…</string>
		</dict>
			
	</array>
	<key>sourceLanguage</key>
	<string>en</string>
	<key>targetLanguage</key>
	<string>es</string>
</dict>

I need to get the string under source (“Add Book…”) and the string under target (“Añadir libro…”) then put these into a table. The mysql part I can do no problem, but can’t figure out how to get them from the XML. I was using something like this:

$xml = new SimpleXMLElement($xmlstr);

foreach ($xml->xpath('//dict') as $dict) {
    echo $dict->key, ' <strong> content: </strong> ', $dict->string, '<br />';
}

But that only gives me the first key and string objects.

Any help would be appreciated.

The XML structure is quite bad. If possible, I’d change it to:


<Dictionary>
    <Entries SourceLanguage="en">
        <Entry>
            <Source>Add Book...</Source>
            <Target Language="es">A&#241;adir libro...</Target>
            <Target Language="de">Buch hinzuf&#252;gen</Target>
        </Entry>
    <Entries>
</Dictionary>

That way your system could cope with many languages and simplexml will be much more comfortable with navigating your data.

The current XML is written as if it’s simply variables dumped out by a program, rather than how data should naturally exist in a file.

An Alternative to the above is:

<Dictionary>
    <Entries>
        <Entry>
            <Term Language="en">Add Book...</Term>
            <Term Language="es">A&#241;adir libro...</Term>
            <Term Language="de">Buch hinzuf&#252;gen</Term>
        </Entry>
</Dictionary>

Which would mean that its slightly more code to get, say, the spanish for “Add Book…”, but it’d be easier to find the german for “Añadir libro” , or the english for “Buch hinzufügen”.

You could use XPath to query for the source and target values. Given the XML presented in the first post, the code could look like;

$xml = new SimpleXMLElement($xmlstr);

foreach ($xml->array->dict as $dict) {
    $source = current($dict->xpath("./key[.='source']/following-sibling::string"));
    $target = current($dict->xpath("./key[.='target']/following-sibling::string"));
    echo "'{$source}' translates to '{$target}'\
";
}

Note that SimpleXMLElement::xpath() returns an array, so we just use the the function current() to get the first value (which assumes there is one!).

Jake is quite right that you could probably make life easier if re-arranging the XML document is an option. However, if not, then the above could provide a hint as to the right direction to move in.

I can’t change the format of the XML

Sorry, hadn’t had a chance to try this before, but now that I did.

Salathe, the code you provided produces the following error:

Warning: Invalid argument supplied for foreach() in parserTest.php on line 29

Line 29 being this line:

foreach ($xml->array->dict as $dict) {

I tried modifying the code as such:

foreach ($xml->array as $array) {
    $source = current($array->dict->xpath("./key[.='source']/following-sibling::string"));
    $target = current($array->xpath("./key[.='target']/following-sibling::string"));
    echo "'{$source}' translates to '{$target}'\
";
} 

And this produces nothing, no output, not even the “translates to” which should output once you enter the foreach loop.

I was able to get some success with the following code:

foreach ($xml->dict as $dict) {
    $source = current($dict->array->dict->xpath("./key[.='source']/following-sibling::string"));
    $target = current($dict->array->dict->xpath("./key[.='target']/following-sibling::string"));
    echo "'{$source}' translates to '{$target}'\
";
} 

But it only returns the first <dict> content, none of the others.

Since you have this badly structured xml, the solution isn’t flexible.

I assumed that your xml is structured like this in case of multiple entries (if it’s different, then the code will have to be changed accordingly) (btw, I changed from spanish to italian, because testing gave me errors with the spanish characters):


<dict>
    <key>entries</key>
    <array>
 
        <dict>
            <key>source</key>
            <string>Add Book...</string>
            <key>target</key>
            <string>Aggiungi libro...</string>
        </dict>
           
    </array>
    <key>sourceLanguage</key>
    <string>en</string>
    <key>targetLanguage</key>
    <string>it</string>
    <key>entries</key>
    <array>
 
        <dict>
            <key>source</key>
            <string>Remove Book...</string>
            <key>target</key>
            <string>Cancella libro...</string>
        </dict>
           
    </array>
    <key>sourceLanguage</key>
    <string>en</string>
    <key>targetLanguage</key>
    <string>it</string>
</dict>

The following code gives me this result:


Array ( [0] => SimpleXMLElement Object ( [key] => Array ( [0] => source [1] => target ) [string] => Array ( [0] => Add Book... [1] => Aggiungi libro... ) ) [1] => SimpleXMLElement Object ( [key] => Array ( [0] => source [1] => target ) [string] => Array ( [0] => Remove Book... [1] => Cancella libro... ) ) )

source Add Book...
target Aggiungi libro...
source Remove Book...
target Cancella libro...

Code


<?php
  $xml = new SimpleXMLElement('http://www.galleons.it/test4/test.xml', null, true);
  $result = $xml->xpath('/dict/array/dict');
print_r($result); echo '<br /><br />';
  foreach($result as $node) {
    echo $node->key[0] . ' ' . $node->string[0] . '<br \\>';
    echo $node->key[1] . ' ' . $node->string[1] . '<br \\>';
  } 
?>

I hope you’ll be able to get the info you need from this?

Hello Guido,

If I substitute your code:

$xml = new SimpleXMLElement('http://www.galleons.it/test4/test.xml', null, true);

for this:

 $xmlG = new SimpleXMLElement('uploads/xmlTestFile.xml', null, true);

Which is the exact same file, just calling it from a local file I have, I get an “invalid argument for foreach()” error. It’s weird that the other code I have works fine with this file so there must be something strange I’m missing.

Try with a url that starts with ‘http://’. I’m not sure that relative paths work.

(EDIT)

I was able to get more all the lines by adding plist/ to the path:
/plist/dict/array/dict

So I think that should do it!

(end edit)

Hmm… there’s something I’m missing. It works with your file, but it won’t work with mine, even if I use a relative or an http path. In fact print_r($result) doesn’t return anything at all.

Again, I don’t think it’s my document since the other code produces output.

Ok, we’re getting somewhere. If I copy your file into a local document, it works. But your code will not work with my file. The difference I see in the headers of both files are:

yours:

<?xml version="1.0" encoding="utf-8" ?>
<dict>
...

Mine:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE plist PUBLIC "-//Apple//DTD PLIST 1.0//EN" "http://www.apple.com/DTDs/PropertyList-1.0.dtd">
<plist version="1.0">
<dict>
...

Problem is I can’t touch the xml file so I can’t remove any of those lines that might be causing the discrepancy.

I suppose you wrote the ‘edit’ part later? So you resolved the problem?

yes, sorry I wasn’t clear (doing 6 things at the same time). Thanks a lot Guido, as usual, you are a prince! I’ve struggled with this for days. :blush:

Hello again, I am getting the following errors when simpleXml finds characters with the umlaut on them, the 2 little dots used in German and Scandinavian languages over vowels: ü

Warning: simplexml_load_file() [function.simplexml-load-file]: uploads/DE.xml:3299: parser error : internal error

The line in question is this:

<string>Diese Seriennummer ist für eine ältere Version des Programms. Bitte besuchen…</string>

As far as I can tell, there is nothing weird with the line other than the umlauts.

Any ideas?

Thanks.