This is mentioned in the comments of the PHP documentation, and I have issues with it as well, I can’t load as a file any documents that have that function in the code.
in the document, its fails to load the XML, and outputs a bunch of errors to libxml_get_errors(), but if I delete the link tag all together its fine, or even if I substitute it for:
It works just fine. Are there any workarounds I can try? I know that outputting the second code slice isn’t valid, and can generate errors, or am I wrong?
You don’t need to say ‘echo’ in your php tag, just urlencode($whatever) should be enough
The ‘echo’ function outputs result to screen and that’s why you having a problem, I think.
Just tried to remove the echo and it still didn’t work. It seems like the best solution (as its the only one I can figure out) is to load the file as a DOM object and then, load the DOM object as a SimpleXML? Is this the best solution?
$DOMfile = new DomDocument();
$DOMfile->loadHTMLFile($uri);
$xml = simplexml_import_dom($DOMfile);
If you already loading at as DOMDOcument then you don’t need to even bother with simplexml, just work with DOMDocument/DOMElement methods, they are more powerful that simplexml anyway.
Can you give a more complete example of the XML that you’re trying to load, and how you do it with SimpleXML? Under the hood, DOM and SimpleXML use the same parser.
Oh, its a standard HTML file. The code in question at the top of the thread was exactly the code it was having issues with. I personally find SimpleXML easier to work with, and it seems a little faster. I do know DOM, just prefer SimpleXML. If they are really the same parser, why would one choke on the code, and another not? This doesn’t work with DOM “load”, just DOM “loadHTMLFile”, FYI.
the DOMDocument’s loadHTML is designed to try to fix broken HTML if it can, SimpleXML does not do that extra work for you. That’s why it works with DOMDocument and not with SimpleXML
If it works with DOMDocument for you, then just use DOM, no need to use SimpleXML. You already created DOMDocument object, you already done the most time consuming work of parsing you html string, so from that point on, working with DOMDOcument is faster than creating another object of SimpleXML
Basically I think the reason why it does not work with SimpleXML is because you are not giving it a valid XML file
<a href=“cart.php?manufacturer=wellington&partNumber=xs 505”>
is not a valid XML because at the very least you need some root tag, and also the a tag is not closed.
try to add the / before closing the a tag, like this:
<a href=“cart.php?manufacturer=wellington&partNumber=xs 505”/>
This may or may not fix the error, I am not sure, but again, since you are giving the HTML and not valid XML as input you may be better off with using DOMDocument, just like you have already discovered
Sorry for the incomplete code. For the record, the full tag is:
<a href="cart.php?manufacturer=wellington&partNumber=<?php echo urlencode("xs 505");?>">Add to cart</a>
which doesn’t work, and,
<a href="cart.php?manufacturer=wellington&partNumber=xs 505">Add to cart</a>
which does work. Is that broken XML/HTML? It validates through dreamweaver.
The php I’m working with is designed to grab the content within the <title> tag, and the content located in the div with the class of ‘grid_10 content’ within the div of ‘container_16’. I will eventually edit the code to pickup the div that contains ‘content’ in the class, I’m using 960.gs for layout, and some content divs are not grid_10, and I don’t want it to be dependent on whether or not the div is located in the div with class of ‘container_16’. The code I’m using is:
It has a php file extension, and yes, has html. That is not the only php in the html file, there are a number of php includes for templating. It doesn’t have problems with that php code, just the urlencode.
If I’m understanding correctly, the double-quotes used in the PHP are the cause of your problem. The document isn’t XML, nor HTML, so is making the parsers hiccup.
I’ll have to keep that in mind for the future, Its good to know. Haven’t tested it yet though, since all of my urlencodes use double-quotes as opposed to single-quotes, so switching all of them would be a huge task I don’t want to do if i don’t have to. I have modified the code, and resorted to using DOM. Here goes:
So, it grabs the file named in variable $uri, runs through and grabs the title and puts it into variable $title, and puts into the variable $body the content of the div containing the class of content. I mysql_escape_string $body before I put it into my database. Thanks for the input!