[FPHP]preg_match_all[/FPHP] instead of preg_match, see where that gets ya (hint: match_all returns a multidimensional array in it’s matches. You want index 1 of the first dimension.)
$yourHTML = <<<TEXT
<html>
<head><title>Test HTML</title></head>
<body>
test text
<ul><li>stock1</li><li>stock2</lii><ul>
</body>
</html>
TEXT;
// Create a new DOM document and load the HTML
$DomDoc = new DOMDocument();
$DomDoc->loadHTML( $yourHTML );
// Create a DOM Xpath so we can query the data
$XPath = new DOMXPath( $DomDoc );
// Query the string and grab all li tags.
// You can perform a null check on $stocks to ensure you
// got results.
$stocks = $XPath->query( "//li" );
// Perform some logic on each piece of data returned
foreach( $stocks as $stock ) {
// access the name of the nodes, in your case "li"
echo $stock->nodeName;
// access the value of the node, in your case each "stock"
echo $stock->nodeValue;
}
LOL well, I’m unsure of what the application ( or intent ) is here with this script, or even the origin of the html. That being said if there was a possibility of tags in nodeValue ( wanted? expected? ) that logic could easily be handled within the foreach, strip it or parse it. That being said I personally prefer xpath when working with markup like html or xml. Each to his own I suppose
I’m going to put my 2c. of opinion in here and agree with Activeseven.
If you’re going to parse some HTML, XML, whateverML (I made that last one up all by myself ;)) then using regular expressions is not the best way to do it.
Any kind of DOM parsing should ideally be done by (you guessed it) a DOM Parser.
While it may add a little extra complexity to your code, it will certainly be a better solution as you’ll have much more control over how everything works, including any markup that may occur inside of product descriptions.
I expanded a little on Activeseven’s example to show that it’s not too complex
$yourHTML = <<<TEXT
<html>
<head><title>Test HTML</title></head>
<body>
test text
<ul>
<li>stock1</li>
<li>stock2 <p>Some para</p></li>
<li>stock3 <p><strong>has</strong> <em>nested</em> <span style="color:red"><em>nodes</em></span></li>
<ul>
</body>
</html>
TEXT;
// Create a new DOM document and load the HTML
$DomDoc = new DOMDocument();
$DomDoc->loadHTML( $yourHTML );
// Create a DOM Xpath so we can query the data
$XPath = new DOMXPath( $DomDoc );
// Query the string and grab all li tags.
// You can perform a null check on $stocks to ensure you got results.
$stocks = $XPath->query( "//li" );
// Perform some logic on each piece of data returned
foreach( $stocks as $stock ) {
$inner_html = get_inner_html($stock);
printf("<pre>%s</pre>\
\
",print_r($inner_html,1));
}
//iterate through a node to get all child nodes
function get_inner_html( $node ) {
$innerHTML= '';
$children = $node->childNodes;
foreach ($children as $child) {
$innerHTML .= $child->ownerDocument->saveXML( $child );
}
return $innerHTML;
}
preg_match_all('/<$tagname>(.*?)<\\/$tagname>/',$yourHTML,$matches);
foreach($matches[1] AS $matchtext) {
echo $matchtext;
}
… personally, i find that a lot easier than invoking an entirely seperate class of object, storing the data at least 2 times, etc…but whatever works for you.
I agree that the regex on the surface is a lot easier, and if you’re dealing with something that’s always going to conform to a specific format then that’s probably fine, but as soon as one of the <li>'s has an attribute on it for example, it will fall over.