Php echo xml string containing <br/> tag

I have a xml file, where some of the paragraphs do have
tag inside.

I am trying to echo the requested strings but the nodeValue ignores the
tag and the string is echoed without a break.

Example:
This is the teaser from the xls file:
<xml version="1.0" encoding="UTF-8"?>
<body>
<div>
<p>-This is the first part of the string.<br/>-This is the second part of the string</p>
</div>
</body>

And here is my php script:

	$xmlDoc = new DOMDocument(); 
	$xmlDoc ->load("example.xml");

	$searchNode = $xmlDoc->getElementsByTagName( "p" ); 

	foreach( $searchNode as $searchNodik ) 
{ 
	$dialogue = $searchNodik->nodeValue;
    
    echo "<p>$dialogue</p>"; 
}

And this echoes me only the strings in one line, ignoring the break tag, like"

-This is the first part of the string.-This is the second part of the string

I have spent a couple of days by browsing the net in order to find some solution, but without success.

Any hint how to solve this, please?

Did you try:
$xmlDoc->formatOutput = true ?

No, but I’ve tried this now as you suggested, but no change.

Well, this is what nodeValue is supposed to do.

You need the inner html, not just the stripped value. I couldn’t find exact docs but this should lead you to the right solution. https://stackoverflow.com/questions/6286362/php-dom-get-nodevalue-html-without-stripping-tags

That’s because innerHTML is not part of the DOM (rather than a JS convenience method). so you would have to create a work around specific to the implementation.

This is a PHP script, no JS going on here.

Thank you all, guys, I’ll try the innerHTML and will see. I thought innerHtml will handle only the HTML tag inside of an another HTML tag, not the whole content including strings.

innerHTML is not part of the DOM and hence not available in PHP.

Yep, I’ve just realised this. I’m trying to figure this out, recently playing with regular patterns, maybe this could help me, but if someone would find an easy solution, how to get the sentences split into two lines, I’d be very greatful.

$ex = <<<HTML
<body>
  <div>
    <p>-This is the first part of the string.<br>-This is the second part of the string</p>
  </div>
</body>
HTML;
// load HTML string
$doc = new DOMDocument;
$doc->loadHTML($ex);
// get first paragraph
$p = $doc->getElementsByTagName('p')->item(0);
// convert DOMNodeList to array to run it through the array functions
$children = iterator_to_array($p->childNodes);
// filter off anything that is not a text node
$text = array_filter($children, function (DOMNode $node) {
    return $node->nodeType === XML_TEXT_NODE;
});
// convert text nodes to text strings
$text = array_map(function (DOMText $node) {
    return $node->data;
}, $text);
// voila
print_r($text);

Well, obviously that only works as intended as long as there are only empty child elements inside.

Uuuuuups, Dormilich, you are the Guru!!! This works great, thank you so much, you’ve made my day !

This is just knowing the holy triad of array functions (map, filter & reduce).

Maybe next year, I am still a beginner. Thank you a lot! :slight_smile:

I have a feeling that what may have been missed is knowing that the DOM has “invisible” text nodes. eg.

<p>foo<br>bar</p>

is seen as

<p><text>foo</text><br><text>bar</text></p>

That is, a crucial part of Dormilich’s code are the “text” lines.

Try this:

<?php 
	declare(strict_types=1);
	error_reporting(-1);
	ini_set('display_errors', '1');

//===========================================================
function extractP($remaining="extractP()", $pStart="<p ", $pEnd="</p>")
{
	$iStart = strpos($remaining, $pStart); // 51
	$iEnd   = strpos($remaining, $pEnd);   // 135
	
	if($iStart):
		echo substr($remaining, $iStart, $iEnd +4 - $iStart);
		$remaining = substr($remaining, $iEnd+4);
	else:
		$remaining = NULL;	
	endif;

	return $remaining;
}///==================================================================


$test = <<< ____TMP
<xml version="1.0" encoding="UTF-8"?>
<body>
<div>
<p>-This is the first part of the string.<br/>-This is the second part of the string</p>
<h1> h1 - This will be ignored </h1>
<p style="color:green;">-THIS IS THE FIRST PART OF THE STRING.<BR/><span style="color:red;">red</span><br>-THIS IS THE SECOND PART OF THE STRING</p>
<h2> h2 - This will be ignored </h2>
</div>
</body>
____TMP;

$title = 'title goes here';
$page = <<< ____TMP
<!DOCTYPE HTML>
<html lang="en">
<head>
<title> $title </title>
</head>
<body>
____TMP;
echo $page;
	echo '<h1>' .$title .'</h1>';
	echo '<hr>';
	
	$remaining = $test;

	while ($remaining):
		$remaining = extractP( $remaining, '<p', '</p>' );
	endwhile;	

echo '</body></html>';

**Output:** [quote] -This is the first part of the string. -This is the second part of the string

-THIS IS THE FIRST PART OF THE STRING.
red
-THIS IS THE SECOND PART OF THE STRING
[/quote]

@John_Betong just be aware that this needs XML conformance of the HTML code. a missing </p>–which is valid in HTML–will make it show unintended text.

1 Like

Many thanks for raising the issue.

The function could be amended to check if $iStart is valid and has no corresponding $iEnd value. Another test for alternative p ending values could be performed.

Currently tapping on a tablet and not easy to test if search for valid HTML scripts without a closing tag. I am guessing that another opening p tag will close the previous tag?

Also upon reflection I think a problem could arise if another p tag is opened immediately after the the first closing p tag. Hopefully check tomorrow.

Reminds me of GIGO :slight_smile:

Any block-level element will close the paragraph.

1 Like

That’s a lot of options :frowning:

Edit:
I wonder if the JavaScript solution caters for the missing closing tag?