preg_match question

How can I capture from source certain block of page, in this case one line for example:

<a href=“/documents/ManWhoStareAtGoats-626511” class=“img” title=“This is awesome movie-626511”>

Now, this piece of code will be outputed dynamicly with some different content between tags

<a href="

and

class="img"

, obviously I need code to anything between <a href=" and class=“img” title=“This is awesome movie-626511”>
I have function for fetch text between tags but this is some advanced tags.


 function getTextBetweenTags($string, $tagname){
  $pattern = "/<$tagname>(.*?)<\\/$tagname>/";
  preg_match($pattern, $string, $matches);
  return $matches[1];
 }

preg_match('/<a href="(.+)" class="img"/', $string, $matches);
echo $matches[1];

Try to understand what this is doing, as this is not really more “advanced” than what you had before.

I’m so gonna get corrected on this but i’ll give it a whack…

XML parsing is always an option :wink:

Your code gives you the matches… foreach the matches and strip the tags out? (Preg match each on “[\”|‘][\w]*[\"|’]‘’ to get the values, preg match on “\s[\w]+=” to get the keys, and then use substr/trim to slice the excess characters off?

Edit: Oh… you were being specific… i was trying to generalize the function.

thanks for fast reply, it worked ofcourse.

Now that’s interesting.
Can you give example on my given code above ?
tanks in advance.

I should point out i have NO idea if this going to work or not, and i’m probably making myself look like a fool :wink:


function getTextInsideTags($string, $tagname){
  $pattern = "/<$tagname(.*?)>/";
  preg_match($pattern, $string, $matches);
  foreach($matches AS $superkey => $value) {
    $pattern1 = "/[\\"|'](.*?)[\\"|']/";
    $pattern2 = "/\\s(\\w+?)=/";
    preg_match($pattern1, $value, $values);
    preg_match($pattern2, $value, $keys);
    foreach($values AS $subkey => $outvals) {
       $matches[$superkey][$keys[$subkey]] = $outvals;
    }
  }
  return $matches;
 } 

The drawback is that you’re then calling preg_match several times and the process just becomes that much slower.

If you’re going to parse XML (when XML parsing is necessary) you should use PHP’s native XML parsing methods rather than make one of your own, which is more likely to break.

If you need to parse HTML, don’t use regular expressions. PHP, since PHP5, has provided the features to parse HTML documents. Using PHP’s DOM library, you can then more easily manage handling markup.

<?php

$doc = new DOMDocument();

/*
 * You can also use a URL if the fopen wrappers are enabled.
 * You may also want to use the @ error suppressor to ignore
 * markup warnings/errors.
 */
if (@$doc->loadHTMLFile('foobar.html')) {
  $xpath = new DOMXPath($doc);
  foreach ($xpath->query('//a[@class="img"]') as $node) {
    if ($node->hasAttribute('href')) {
      $href = $node->getAttribute('href');
      // ...
    }
  }
}
else {
  echo "<p>Unable to open document.</p>\
";
}

Of course, you can modify the code to retrieve other attributes. You could also import to SimpleXML.