Line Breaks in XML

Hi,

I have an XML file which is rendered into HTML via a php script.

Within the XML file I have a <TEXT> element which can contain several paragraphs of text. However when trying to create a line break within the <TEXT> element the XML throws an error due to the <br /> tag.

Simply put, I want to add a line break within an XML element.

I understand that XML is more of storage medium and that no display data should be contained within, but as it is being rendered directly to HTML and styled using CSS there is no XSL Stylesheet accompanying it.

Any ideas? :confused:

Are all the break tags self-closing, ie. <br /> not <br>?
AFAIK, self-closing tags should be considered well-formed and not cause an error.

Using a self closing <br /> tag displays the XML but it seems to trick it into thinking the XML Tag has closed and begins to output empty code. The php that renders the XML is:

<?php
for($x=0;$x<count($story_array);$x++){
    echo "\	<h2 class='story'>" . $story_array[$x]->headline . "</h2>\
";
    echo "\	<em class='author'>" . $story_array[$x]->author .", ". $story_array[$x]->position . "</em>\
";
	echo "\	<p class='date'>" . $story_array[$x]->whatdate . "</p>\
";
	echo "\	<p class='text'>" . $story_array[$x]->text . "</p>\
";
	echo "\	\	\
";
}
?>

the story_array being an array storing variables with the xml ‘paths’ in it. One instance of the XML is:

<CONTENT>

<ARTICLE>

<TITLE>Article Number One</TITLE>

<AUTHOR>A.N Other</AUTHOR>

<POSITION>President</POSITION>

<DATE>21st April 2008</DATE>

<TEXT>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, <br /> remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</TEXT>

</ARTICLE>

With the line break in place as it is above, the resulting html looks like this:


<h2 class='story'>Article Number One</h2>
	<em class='author'>A.N Other, President</em>
	<p class='date'>21st April 2008</p>
	<p class='text'>Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, </p>

		
	<h2 class='story'></h2>
	<em class='author'>, </em>
	<p class='date'></p>
	<p class='text'> remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.</p>

As you can see it breaks the line but then starts to regurgetate empty HTML tags after the <br /> tag.

Thanks in advance :slight_smile:

It looks like the br tag is somehow throwing off the node “depth” in the XML parser. What are you using to parse the XML?

I am using a php script in the head of the document:


<!--XML Parsing--->
<?php

$xml_file = "HomeContent.xml";

$xml_headline_key = "*CONTENT*ARTICLE*TITLE";
$xml_description_key = "*CONTENT*ARTICLE*AUTHOR";
$xml_position_key = "*CONTENT*ARTICLE*POSITION";
$xml_date_key = "*CONTENT*ARTICLE*DATE";
$xml_text_key = "*CONTENT*ARTICLE*TEXT";

$story_array = array();

$counter = 0;
class xml_story{
    var $headline, $author, $position, $whatdate, $text;
}

function startTag($parser, $data){
    global $current_tag;
    $current_tag .= "*$data";
}

function endTag($parser, $data){
    global $current_tag;
    $tag_key = strrpos($current_tag, '*');
    $current_tag = substr($current_tag, 0, $tag_key);
}

function contents($parser, $data){
    global $current_tag, $xml_headline_key, $xml_description_key, $xml_position_key, $xml_date_key, $xml_text_key, $counter, $story_array;
    switch($current_tag){
        case $xml_headline_key:
            $story_array[$counter] = new xml_story();
            $story_array[$counter]->headline = $data;
            break;
        case $xml_description_key:
            $story_array[$counter]->author = $data;
            break;
		case $xml_position_key:
			$story_array[$counter]->position = $data;
			break;
		case $xml_date_key:
            $story_array[$counter]->whatdate = $data;
            break;
		case $xml_text_key:
            $story_array[$counter]->text = $data;
            $counter++;
			break;
    }
}

$xml_parser = xml_parser_create();

xml_set_element_handler($xml_parser, "startTag", "endTag");

xml_set_character_data_handler($xml_parser, "contents");

$fp = fopen($xml_file, "r") or die("Could not open file");

$data = fread($fp, filesize($xml_file)) or die("Could not read file");

if(!(xml_parse($xml_parser, $data, feof($fp)))){
    die("Error on line " . xml_get_current_line_number($xml_parser));
}

xml_parser_free($xml_parser);

fclose($fp);

?>

<!-- End XML Parsing---->

<!-- Output code for HTML --->

<html>
<head>
<title>XML</title>
</head>

<?php
for($x=0;$x<count($story_array);$x++){
    echo "\	<h2 class='story'>" . $story_array[$x]->headline . "</h2>\
";
    echo "\	<em class='author'>" . $story_array[$x]->author .", ". $story_array[$x]->position . "</em>\
";
	echo "\	<p class='date'>" . $story_array[$x]->whatdate . "</p>\
";
	echo "\	<p class='text'>" . $story_array[$x]->text . "</p>\
";
	echo "\	\	\
";
}
?>
</body>
</html>

<!--- End Output code--->

From what I can see, you have 2 choices that might work using that script.
First, if you have control over the XML file content, you can try putting the the text (with other tags in it) inside CDATA (Character DATA) so the XML parser will ignore it. That is, so it will consider the tags as text and not tags. eg.

<DATE>21st April 2008</DATE>

<TEXT>
<![CDATA[
Lorem Ipsum is simply dummy text of the printing and typesetting industry. Lorem Ipsum has been the industry's standard dummy text ever since the 1500s, when an unknown printer took a galley of type and scrambled it to make a type specimen book. It has survived not only five centuries, but also the leap into electronic typesetting, <br /> remaining essentially unchanged. It was popularised in the 1960s with the release of Letraset sheets containing Lorem Ipsum passages, and more recently with desktop publishing software like Aldus PageMaker including versions of Lorem Ipsum.
]]>
</TEXT>

</ARTICLE>

the other is to either add “non-tree” tags to the contents function or use a “default” so that when the parser encounters one that isn’t an "$xml_…_key "as a $current_tag, it won’t break the switch-case. eg.

 ....
       case $xml_text_key:
            $story_array[$counter]->text = $data;
            $counter++;
            break;

        default:
            break;

I’m unsure of how to use the switch-case default here. My thinking is that the script might not break, but that text after the br tag will get skipped. There may be a way to have it go back to the same node and continue, but I can’t think of how it might be done ATM. And you would need to find a way to add the tags back into the output.
In any case, hopefully you can wrap the extra mark-up tags in CDATA and you will be good to go. This will take care of other HTML mark-up in the XML like bold, underline, span, etc. too, and IMHO is the best way to treat the tags as in this use, as far as the XML file is concerned, they are not tags but text.

Many thanks, this seems to work perfectly.

Thanks again for your help! :smiley: