Php echo xml string containing <br/> tag

ljakab001 · August 20, 2017, 1:00pm

I have a xml file, where some of the paragraphs do have
tag inside.

I am trying to echo the requested strings but the nodeValue ignores the
tag and the string is echoed without a break.

Example:
This is the teaser from the xls file:
<xml version="1.0" encoding="UTF-8"?>
<body>
<div>
<p>-This is the first part of the string.<br/>-This is the second part of the string</p>
</div>
</body>

And here is my php script:

	$xmlDoc = new DOMDocument(); 
	$xmlDoc ->load("example.xml");

	$searchNode = $xmlDoc->getElementsByTagName( "p" ); 

	foreach( $searchNode as $searchNodik ) 
{ 
	$dialogue = $searchNodik->nodeValue;
    
    echo "<p>$dialogue</p>"; 
}

And this echoes me only the strings in one line, ignoring the break tag, like"

-This is the first part of the string.-This is the second part of the string

I have spent a couple of days by browsing the net in order to find some solution, but without success.

Any hint how to solve this, please?

marklenon95 · August 20, 2017, 4:52pm

Did you try:
$xmlDoc->formatOutput = true ?

ljakab001 · August 20, 2017, 6:34pm

No, but I’ve tried this now as you suggested, but no change.

Dormilich · August 21, 2017, 7:52am

Well, this is what nodeValue is supposed to do.

zack1 · August 21, 2017, 4:50pm

You need the inner html, not just the stripped value. I couldn’t find exact docs but this should lead you to the right solution. https://stackoverflow.com/questions/6286362/php-dom-get-nodevalue-html-without-stripping-tags

Dormilich · August 21, 2017, 4:55pm

That’s because innerHTML is not part of the DOM (rather than a JS convenience method). so you would have to create a work around specific to the implementation.

zack1 · August 21, 2017, 4:59pm

This is a PHP script, no JS going on here.

ljakab001 · August 22, 2017, 11:12am

Thank you all, guys, I’ll try the innerHTML and will see. I thought innerHtml will handle only the HTML tag inside of an another HTML tag, not the whole content including strings.

Dormilich · August 22, 2017, 12:19pm

innerHTML is not part of the DOM and hence not available in PHP.

ljakab001 · August 22, 2017, 12:29pm

Yep, I’ve just realised this. I’m trying to figure this out, recently playing with regular patterns, maybe this could help me, but if someone would find an easy solution, how to get the sentences split into two lines, I’d be very greatful.

Dormilich · August 22, 2017, 12:52pm

$ex = <<<HTML
<body>
  <div>
    <p>-This is the first part of the string.<br>-This is the second part of the string</p>
  </div>
</body>
HTML;
// load HTML string
$doc = new DOMDocument;
$doc->loadHTML($ex);
// get first paragraph
$p = $doc->getElementsByTagName('p')->item(0);
// convert DOMNodeList to array to run it through the array functions
$children = iterator_to_array($p->childNodes);
// filter off anything that is not a text node
$text = array_filter($children, function (DOMNode $node) {
    return $node->nodeType === XML_TEXT_NODE;
});
// convert text nodes to text strings
$text = array_map(function (DOMText $node) {
    return $node->data;
}, $text);
// voila
print_r($text);

Well, obviously that only works as intended as long as there are only empty child elements inside.

ljakab001 · August 22, 2017, 1:16pm

Uuuuuups, Dormilich, you are the Guru!!! This works great, thank you so much, you’ve made my day !

Dormilich · August 22, 2017, 1:56pm

This is just knowing the holy triad of array functions (map, filter & reduce).

ljakab001 · August 22, 2017, 1:57pm

Maybe next year, I am still a beginner. Thank you a lot!

Mittineague · August 22, 2017, 7:11pm

I have a feeling that what may have been missed is knowing that the DOM has “invisible” text nodes. eg.

<p>foo<br>bar</p>

is seen as

<p><text>foo</text><br><text>bar</text></p>

That is, a crucial part of Dormilich’s code are the “text” lines.

John_Betong · August 23, 2017, 3:20am

Try this:

<?php 
	declare(strict_types=1);
	error_reporting(-1);
	ini_set('display_errors', '1');

//===========================================================
function extractP($remaining="extractP()", $pStart="<p ", $pEnd="</p>")
{
	$iStart = strpos($remaining, $pStart); // 51
	$iEnd   = strpos($remaining, $pEnd);   // 135
	
	if($iStart):
		echo substr($remaining, $iStart, $iEnd +4 - $iStart);
		$remaining = substr($remaining, $iEnd+4);
	else:
		$remaining = NULL;	
	endif;

	return $remaining;
}///==================================================================


$test = <<< ____TMP
<xml version="1.0" encoding="UTF-8"?>
<body>
<div>
<p>-This is the first part of the string.<br/>-This is the second part of the string</p>
<h1> h1 - This will be ignored </h1>
<p style="color:green;">-THIS IS THE FIRST PART OF THE STRING.<BR/><span style="color:red;">red</span><br>-THIS IS THE SECOND PART OF THE STRING</p>
<h2> h2 - This will be ignored </h2>
</div>
</body>
____TMP;

$title = 'title goes here';
$page = <<< ____TMP
<!DOCTYPE HTML>
<html lang="en">
<head>
<title> $title </title>
</head>
<body>
____TMP;
echo $page;
	echo '<h1>' .$title .'</h1>';
	echo '<hr>';
	
	$remaining = $test;

	while ($remaining):
		$remaining = extractP( $remaining, '<p', '</p>' );
	endwhile;	

echo '</body></html>';

**Output:** [quote] -This is the first part of the string. -This is the second part of the string

-THIS IS THE FIRST PART OF THE STRING.
red
-THIS IS THE SECOND PART OF THE STRING
[/quote]

Dormilich · August 23, 2017, 12:25pm

@John_Betong just be aware that this needs XML conformance of the HTML code. a missing </p>–which is valid in HTML–will make it show unintended text.

John_Betong · August 23, 2017, 1:23pm

Many thanks for raising the issue.

The function could be amended to check if $iStart is valid and has no corresponding $iEnd value. Another test for alternative p ending values could be performed.

Currently tapping on a tablet and not easy to test if search for valid HTML scripts without a closing tag. I am guessing that another opening p tag will close the previous tag?

Also upon reflection I think a problem could arise if another p tag is opened immediately after the the first closing p tag. Hopefully check tomorrow.

Reminds me of GIGO

Dormilich · August 23, 2017, 1:25pm

Any block-level element will close the paragraph.

John_Betong · August 23, 2017, 1:40pm

That’s a lot of options

Edit:
I wonder if the JavaScript solution caters for the missing closing tag?

Topic		Replies	Views
php5 need something like innerHTML instead of nodeValue PHP	9	17136	October 8, 2014
Reading xml correctly PHP	6	819	February 26, 2010
XML / XPath retrieve inner text of node? PHP	18	17729	October 8, 2014
Line Breaks in XML HTML & CSS xml	7	21332	October 8, 2014
Problem in Xpath nodevalue PHP	2	1790	February 9, 2011

Php echo xml string containing <br/> tag

Related topics