Help with the syntax of a command

Hello all
I have a description string $product_desc which is like this

test text
<li>stock</li>
<li>stock2</li>

I want to call it in a php page
I have this commands

function getTextBetweenTags($string, $tagname,$i)
{
echo $string.’ - ‘.$tagname.’<br>‘;
$matches = array();
$pattern = “/<$tagname>(.*?)<\/$tagname>/”;
preg_match($pattern, $string, $matches);
//print_r($matches);
return $matches;
}
$content = getTextBetweenTags( $product_desc,‘li’, 1);
foreach( $content as $item )
{
echo $item.’<br>';
}
?>

but it doesnt seem to work correct. It just returns

test text

stock
stock2
- li
stock

stock

What am I doing wrong?

EDIT: No it’s not. Misread the output. Please hold.

Thought i’d edited that post… anyway

[FPHP]preg_match_all[/FPHP] instead of preg_match, see where that gets ya (hint: match_all returns a multidimensional array in it’s matches. You want index 1 of the first dimension.)

Thanks for your reply

If I use preg_match_all it gives me
Array
Array

This means that it reads the 2 values that I want?
How can I see the text and not the array?

print_r on $content, and you should see what it’s doing. (look at it in Source view)

Ok the code so far is
<?php
function getTextBetweenTags($string, $tagname)
{

   $matches = array();
   $pattern = "/&lt;$tagname&gt;(.*?)&lt;\\/$tagname&gt;/";
   preg_match_all($pattern, $string, $matches);
   //print_r($matches);
   return $matches;
}
  $content = getTextBetweenTags( $product_desc,'li');
foreach( $content as $item )
{
print_r ($content);
    echo $item.'&lt;br&gt;';
} ?&gt;

and the result is

Array ( [0] => Array ( [0] =>
stock
[1] =>
stock 2
) [1] => Array ( [0] => stock [1] => stock 2 ) ) Array ( [0] => Array ( [0] =>
stock
[1] =>
stock 2
) [1] => Array ( [0] => stock [1] => stock 2 ) )

Is this normal? How can I get only the stock and stock 2 text?

I ended up with this code

<?php

    if(!function_exists('getTextBetweenTags'))
{
  function getTextBetweenTags($string, $tagname)
  {
   
   $matches = array();
   $pattern = "/&lt;$tagname&gt;(.*?)&lt;\\/$tagname&gt;/";
   preg_match_all($pattern, $string, $matches);
  //print_r($matches);	  
   return $matches;
  }
 }
 $content = getTextBetweenTags( $product_desc,'li');
 $content = implode ($content);
 print_r ($content);
?&gt;

but I get an error “Notice: Array to string conversion”
how can I fix this?

If I may suggest an alternate method:


$yourHTML = <<<TEXT
<html>
<head><title>Test HTML</title></head>
<body>
test text
<ul><li>stock1</li><li>stock2</lii><ul>
</body>
</html>
TEXT;

// Create a new DOM document and load the HTML
$DomDoc = new DOMDocument();
$DomDoc->loadHTML( $yourHTML );

// Create a DOM Xpath so we can query the data
$XPath =  new DOMXPath( $DomDoc );

// Query the string and grab all li tags.
// You can perform a null check on $stocks to ensure you
// got results.
$stocks = $XPath->query( "//li" );

// Perform some logic on each piece of data returned
foreach( $stocks as $stock ) {
	// access the name of the nodes, in your case "li"
	echo $stock->nodeName;
	
	// access the value of the node, in your case each "stock"
	echo $stock->nodeValue;
}

Which works fine, seven, until someone sticks some HTML in the middle of the LI. Let’s stick to stuff that works in all instances, shall we? :wink:

ripper:
Your print_r should have actually been:
Array ( [0] => Array ( [0] =>
<li>stock</li>
[1] =>
<li>stock 2</li>
) [1] => Array ( [0] => stock [1] => stock 2 ) )

(You didnt view it in Source view, did you?)

$content[1] holds the two values you want.

LOL well, I’m unsure of what the application ( or intent ) is here with this script, or even the origin of the html. That being said if there was a possibility of tags in nodeValue ( wanted? expected? ) that logic could easily be handled within the foreach, strip it or parse it. That being said I personally prefer xpath when working with markup like html or xml. Each to his own I suppose :slight_smile:

I’m going to put my 2c. of opinion in here and agree with Activeseven.

If you’re going to parse some HTML, XML, whateverML (I made that last one up all by myself ;)) then using regular expressions is not the best way to do it.

Any kind of DOM parsing should ideally be done by (you guessed it) a DOM Parser.

While it may add a little extra complexity to your code, it will certainly be a better solution as you’ll have much more control over how everything works, including any markup that may occur inside of product descriptions.

I expanded a little on Activeseven’s example to show that it’s not too complex :wink:

$yourHTML = <<<TEXT
<html>
<head><title>Test HTML</title></head>
<body>
test text
<ul>
    <li>stock1</li>
    <li>stock2 <p>Some para</p></li>
    <li>stock3 <p><strong>has</strong> <em>nested</em> <span style="color:red"><em>nodes</em></span></li>
<ul>
</body>
</html>
TEXT;

 // Create a new DOM document and load the HTML
$DomDoc = new DOMDocument();
$DomDoc->loadHTML( $yourHTML );

 // Create a DOM Xpath so we can query the data
$XPath =  new DOMXPath( $DomDoc );

// Query the string and grab all li tags.
// You can perform a null check on $stocks to ensure you got results.
$stocks = $XPath->query( "//li" );

 // Perform some logic on each piece of data returned
foreach( $stocks as $stock ) {
    $inner_html = get_inner_html($stock);
    printf("<pre>%s</pre>\
\
",print_r($inner_html,1));    
}  

 //iterate through a node to get all child nodes
function get_inner_html( $node ) {
    $innerHTML= '';
    $children = $node->childNodes;
    foreach ($children as $child) {
        $innerHTML .= $child->ownerDocument->saveXML( $child );
    }

    return $innerHTML;
} 

As opposed to


preg_match_all('/<$tagname>(.*?)<\\/$tagname>/',$yourHTML,$matches);
foreach($matches[1] AS $matchtext) {
  echo $matchtext;
}

… personally, i find that a lot easier than invoking an entirely seperate class of object, storing the data at least 2 times, etc…but whatever works for you.

I agree that the regex on the surface is a lot easier, and if you’re dealing with something that’s always going to conform to a specific format then that’s probably fine, but as soon as one of the <li>'s has an attribute on it for example, it will fall over.

if(!function_exists(‘getTextBetweenTags’))
{
function getTextBetweenTags($string, $tagname)
{

   $matches = array();
   $pattern = "/&lt;$tagname&gt;(.*?)&lt;\\/$tagname&gt;/";
   preg_match_all($pattern, $string, $matches);
  //print_r($matches);	  
   return $matches;
  }
 }
 $content = getTextBetweenTags( $product_desc,'li');
 //$content1 = implode ($content);
 print_r ($content[1]);

The string I have is

test text
<li>stock</li>
<li>stock2</li>

The result in source code is

Array
(
[0] => stock
[1] => stock2
)

It’s the string I want but I want only the text stock, stock 2 and not the Array ( … )

Edit : I am using joomla and the value I want is the $product_desc from virtuemart.

so… foreach it to walk through the array, or implode and echo it… print_r was to show you the structure of the output from match_all.

Finally it’s over

Thank you all for your help.

the finished code is

  <?php
   
        if(!function_exists('getTextBetweenTags'))
    {
      function getTextBetweenTags($string, $tagname)
      {
       
       $matches = array();
       $pattern = "/<$tagname>(.*?)<\\/$tagname>/";
       preg_match_all($pattern, $string, $matches);
      //print_r($matches);	  
	   return $matches;
      }
	 }
     $content = getTextBetweenTags( $product_desc,'li');
	 
	 //print_r ($content[1]);
	 foreach ($content[1] as $matchtext)
	 {
	 echo " $matchtext <br />\
";
	 }

	?>

Thanks again :tup: