Reading xml correctly

I’ve got this xml


<notes>

<note>
<title>Dag 1</title>
<details>
<detail>Test A</detail>
<detail>Test B</detail>
</details>
</note>

<note>
<title>Dag 2</title>
<details>
<detail>Test C</detail>
<detail>Test D</detail>
<detail>Test E</detail>
</details>
</note>

</notes> 

Each ‘note’ has several ‘titles’ and each ‘title’ has different numbers of ‘details’.

The output I want to have is this:

Dag 1
Test A
Test B

Dag 2
Test C
Test D
Test E

But instead I’m getting this in my browser:

Dag 1
Test A Test B
Dag 2
Test C Test D Test E

I suspect it’s a mistake in the way I parse the xml in my php code, but the linebreaks which supposed to be added behind each ‘Test’ line disappear in the source code. The source shows:

Dag 1<br />

Test A
Test B
<br />
Dag 2<br />

Test C
Test D
Test E
<br />

Where of course it should be this:

Dag 1<br />

Test A<br />
Test B<br />

Dag 2<br />

Test C<br />
Test D<br />
Test E<br />

Can someone tell me where I’m going wrong with the php part? This is the php part which parses the xml:

<?php
  $doc = new DOMDocument();
  $doc->load( 'text.xml' );
  
  $notes = $doc->getElementsByTagName( "note" );
  foreach( $notes as $value )
  {
  $titles = $value->getElementsByTagName( "title" );
  $details = $value->getElementsByTagName( "details" );

  $title = $titles->item(0)->nodeValue;
  $detail = $details->item(0)->nodeValue;

  echo $title."<br />\
";
  echo $detail."<br />\
";
  }
  ?>

I prefer [fphp]SimpleXML[/fphp] myself.


$string = '<notes>
    <note>
        <title>Dag 1</title>
        <details>
            <detail>Test A</detail>
            <detail>Test B</detail>
        </details>
    </note>
    <note>
        <title>Dag 2</title>
        <details>
            <detail>Test C</detail>
            <detail>Test D</detail>
            <detail>Test E</detail>
        </details>
    </note>
</notes>
';

$xml = new SimpleXMLElement($string);

foreach($xml->note as $note){
    printf('<h2>&#37;s</h2>', $note->title);
    foreach($note->details->detail as $detail){
        printf('%s<br />', $detail);
    }
}

/*
    <h2>Dag 1</h2>
    Test A<br />
    Test B<br />
    <h2>Dag 2</h2>
    Test C<br />
    Test D<br />
    Test E<br />
*/

Still, I’m curious why I can’t get a break tag appear after every ‘Test A-F’ line :slight_smile:

Tried this e.g.:

<?php

$dom = new DomDocument();
$dom->prevservWhiteSpace = false;

if (!@$dom->load("text.xml")) {
    echo "text.xml doesn't exist!\
";
    return;
}

$imageList = $dom->getElementsByTagName('details');
$imageCnt  = $imageList->length;

for ($idx = 0; $idx < $imageCnt; $idx++) {
    print $imageList->item($idx)->nodeValue . "<br />\
";
}

?>

It finds all the detail nodes but it only adds the break tag after the final ‘Test’ line of each individual ‘details’ node. Appearently ‘nodeValue’ stands for al ‘detail’ nodes inside each ‘details’ node combined? How can I also get the break tag after each ‘detail’ node value?


Test A
Test B
<br />

Test C
Test D
Test E
<br />

I’ve already fixed it by using this code

<?php
  $doc = new DOMDocument();
  $doc->load( 'text.xml' );
  
  $notes = $doc->getElementsByTagName( "note" );

  foreach( $notes as $note )
{

$title = $note->getElementsByTagName( "title" );

echo  $title->item(0)->nodeValue . "<br />\
";
  
$details = $note->getElementsByTagName( "detail" );

foreach( $details as $detail )
{
echo $detail->nodeValue . "<br />\
"; 
}

}
  ?>

The thing I don’t understand though is why I have to use item(0) when echoing the title but mustn’t use it when echoing the detail. How exacly does getElementsByTagName work regarding that? As I understand it it’s forming an array of all elements (xml tags)? Why then can’t I use something like $title[0]?

But anyway, when to use item(0) and when not to use?

getElementsByTagName returns a DOMNodeList object, not an array. It’s similar to an array in that it contains multiple ordered values, and you can access a value by a numeric index, and you can find out how many things are in it, and you can loop over it, but syntax is a bit different. use item() if you want a specific element from the ordered list. If you want all items from the list, use a loop like foreach.

Now, in the xml you presented to us, each <note> node always has a single <title> node as a descendant within it. So, it doesn’t really matter if you don’t use a foreach loop on the list of <title> nodes, because there’s only one. But, if you might have multiple <title> nodes in each of the <note> nodes, you better use a loop. You actually said in your original post that each note has several titles, but your xml doesn’t reflect this.

You’re right, I actually meant that the main ‘notes’ node had several ‘note’ nodes. Each ‘note’ node has one title which has one or more ‘details’.

The thing I still am not clear about is:

$title = $note->getElementsByTagName( "title" );
echo  $title->item(0)->nodeValue . "<br />\
";

Now I have to use item() to get to the nodeValue of $title

But:

$details = $note->getElementsByTagName( "detail" );
foreach( $details as $detail )
{
echo $detail->nodeValue . "<br />\
"; 
}

Now I musn’t use item(). Why is that?

Edit: O, wait (correct me if I’m wrong) but I just read about ‘foreach’ and as I understand it, it already takes the (node)value of the first $detail. And continues to do so as long as there are values in that array. It’s like ‘foreach’ is already using item(0) the first time around and increases item() by 1 each time. Is that correct?

By the way, I’m fairly new to xml/php but am unclear about ‘element’. Is that (a) the xml tag (b) the node or (c) the nodevalue? (d) something else?

Yes, these loops are equivalent. The foreach version being much easier to read.


$listOfDetailElements = $note->getElementsByTagName( "detail" );
foreach( $listOfDetailElements as $detailElement )
{
    echo $detailElement->nodeValue . "<br />\
";
}

for ($i = 0; $i < $listOfDetailElements->length; $i++) {
    $detailElement = $listOfDetailElements->item($i);
    echo $detailElement->nodeValue . "<br />\
";
}

An element is a node, but a specialized type of node with additional characteristics and functionality. All elements are also nodes, but not all nodes are also elements. Kinda like how a female is always a person, but but a person isn’t always a female.

So, it would have been more precise of me to say “element” instead of “node”.

Now, in the xml you presented to us, each <note> node always has a single <title> node as a descendant within it. So, it doesn’t really matter if you don’t use a foreach loop on the list of <title> nodes, because there’s only one. But, if you might have multiple <title> nodes in each of the <note> nodes, you better use a loop. You actually said in your original post that each note has several titles, but your xml doesn’t reflect this.

A $detailElement corresponds closely to a <detail> tag in the xml text.

Maybe if you spend some time learning about the DOM it will make more sense. http://www.w3schools.com/dom/default.asp