DomDocument -> loadHTMLFile problems

Hey everyone,

I’ve been experimenting with DomDocument and playing around with files, but i’ve run into a problem. I’m trying to remotely load one of my webpages, but I’m having some errors thrown at me. My code is:

<?php
$remote = file_get_contents('remotefile.html');

$doc = new DomDocument();
$file = $doc->loadHTML($remote);
$cells = $doc->getElementsByTagName('td');

foreach($cells AS $cell)
{
    if($cell->getAttribute('class') == 'title')
    {
        
        echo $cell->nodeValue . '<br />';
    }
}
?>

remotefile.html is not actually a remote file right now because when I was getting these errors I decided to try just downloading the file and placing it in the same directory and trying to load it, to see if that took care of the errors (it didn’t). Anyways, the errors I am getting are:

Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Unexpected end tag : img in Entity, line: 37 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseEntityRef: expecting ';' in Entity, line: 317 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseEntityRef: expecting ';' in Entity, line: 323 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Unexpected end tag : img in Entity, line: 603 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseStartTag: invalid element name in Entity, line: 603 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Opening and ending tag mismatch: a and b in Entity, line: 604 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: Unexpected end tag : img in Entity, line: 610 in test.php on line 27
Warning: DOMDocument::loadHTML() [function.DOMDocument-loadHTML]: htmlParseStartTag: misplaced <body> tag in Entity, line: 665 in test.php on line 27

The PHP docs says that “HTML does not have to be well-formed to load” - are these errors I am getting saying that the page is too badly formed to load? Or is there something else going on?

Can you post the html file you’re trying to load?

loadHTML expects valid markup, i’m afraid most page’s arn’t.

You can alter the code to suppress markup errors:-


$file = @$doc->loadHTML($remote);

SilverB.

From PHP.net

Unlike loading XML, HTML does not have to be well-formed to load

Are you saying it does require valid markup from experience?

Are you saying it does require valid markup from experience?

Indeed, “HTML does not have to be well-formed to load” which would suggest it would still load, yet only throw warnings as opposed to errors with XML.

I have always suppressed these warning in the past and successfully traversed the DOM.

Every error you have posted is a mark-up warning. However, if you cannot traverse the DOM, we have another issue altogether.:slight_smile:

SilverB.