preg_match question

jovalex · April 9, 2010, 10:32am

How can I capture from source certain block of page, in this case one line for example:

Now, this piece of code will be outputed dynamicly with some different content between tags

<a href="

and

class="img"

, obviously I need code to anything between <a href=" and class=“img” title=“This is awesome movie-626511”>
I have function for fetch text between tags but this is some advanced tags.


 function getTextBetweenTags($string, $tagname){
  $pattern = "/<$tagname>(.*?)<\\/$tagname>/";
  preg_match($pattern, $string, $matches);
  return $matches[1];
 }

Raffles · April 9, 2010, 11:08am

preg_match('/<a href="(.+)" class="img"/', $string, $matches);
echo $matches[1];

Try to understand what this is doing, as this is not really more “advanced” than what you had before.

StarLion · April 9, 2010, 11:18am

I’m so gonna get corrected on this but i’ll give it a whack…

XML parsing is always an option

Your code gives you the matches… foreach the matches and strip the tags out? (Preg match each on “[\”|‘][\w]*[\"|’]‘’ to get the values, preg match on “\s[\w]+=” to get the keys, and then use substr/trim to slice the excess characters off?

Edit: Oh… you were being specific… i was trying to generalize the function.

jovalex · April 9, 2010, 11:18am

thanks for fast reply, it worked ofcourse.

jovalex · April 9, 2010, 11:21am

Now that’s interesting.
Can you give example on my given code above ?
tanks in advance.

StarLion · April 9, 2010, 11:33am

I should point out i have NO idea if this going to work or not, and i’m probably making myself look like a fool


function getTextInsideTags($string, $tagname){
  $pattern = "/<$tagname(.*?)>/";
  preg_match($pattern, $string, $matches);
  foreach($matches AS $superkey => $value) {
    $pattern1 = "/[\\"|'](.*?)[\\"|']/";
    $pattern2 = "/\\s(\\w+?)=/";
    preg_match($pattern1, $value, $values);
    preg_match($pattern2, $value, $keys);
    foreach($values AS $subkey => $outvals) {
       $matches[$superkey][$keys[$subkey]] = $outvals;
    }
  }
  return $matches;
 }

Raffles · April 9, 2010, 12:01pm

The drawback is that you’re then calling preg_match several times and the process just becomes that much slower.

If you’re going to parse XML (when XML parsing is necessary) you should use PHP’s native XML parsing methods rather than make one of your own, which is more likely to break.

dyer85 · April 10, 2010, 10:28am

If you need to parse HTML, don’t use regular expressions. PHP, since PHP5, has provided the features to parse HTML documents. Using PHP’s DOM library, you can then more easily manage handling markup.

<?php

$doc = new DOMDocument();

/*
 * You can also use a URL if the fopen wrappers are enabled.
 * You may also want to use the @ error suppressor to ignore
 * markup warnings/errors.
 */
if (@$doc->loadHTMLFile('foobar.html')) {
  $xpath = new DOMXPath($doc);
  foreach ($xpath->query('//a[@class="img"]') as $node) {
    if ($node->hasAttribute('href')) {
      $href = $node->getAttribute('href');
      // ...
    }
  }
}
else {
  echo "<p>Unable to open document.</p>\
";
}

Of course, you can modify the code to retrieve other attributes. You could also import to SimpleXML.

Topic		Replies	Views
[Solved] Regular expression - parse string PHP	10	1651	April 19, 2015
Newbie: preg_match PHP	6	492	April 12, 2010
preg_match_all problem PHP	9	1792	March 31, 2011
Preg_Match Question PHP	8	1167	October 8, 2014
PREG (regex) refresher needed! PHP	10	825	May 6, 2010

preg_match question

Related topics