PREG (regex) refresher needed!

Regular expressions are not my specialty. I am trying to solve a simple problem, but coming up with a blank.

If the string is:
“The fox jumped <strong>over the road</strong> and landed on the other side.”

Using PREG (not ereg), how would I extract everything between the “strong” tags, even including the strong tags. What I want to end up with is:

<strong>over the road</strong>

I will always know the exact START and END of the string (in this case the strong tags). I just need to grab both of those, and everything in between.

Something like this would work:


Or, if you want to use a capture group to get only the text between the tags:


Note that \w only does alphanumeric and \s does spaces. If you want punctuation marks in addition to that, you’ll have to add them.

I usually use this AIR app to quickly test regular expressions: (there’s a link to the desktop version on the bottom right)



Everything except <, seeing as tha’s where </strong> starts.
That way you don’t have to specify everything you do want to match (like characters with accents, and so on).

That’s probably a better approach.

But…what if there’s a link tag between the strong tags?

I guess you use negative lookahead to look for </strong>, or go for the root all evil: the ANYTHING atom: (.) (just make sure to make it lazy though (.?)).

Thanks everyone. I’ll work with a few of these thoughts and come up with something that works for this project.

I’m sorry, but my regex skills are very limited. Still having problems, mainly because I don’t fully understand how preg works.

With the suggestions above, it is returning the boolian response (0 or 1 depending on what I toss in the test string). However, I am wanting the actual “string”.

Using my initial example in this post, how do I get it to extract the string so I can store it in a variable? (either "<strong>over the road</strong> " or even “over the road” would be fine if that is easier.)

If someone could provide me with a full example I would really appreciate it! Thanks.

Run the following code to see how preg_match works:

$matches = array();
$str = "The fox jumped <strong>over the road</strong> and landed on the other side.";
preg_match('/<strong>([^<]+)</strong>/', $str, $matches);

Thanks for the example - it certainly gives me a better idea of how it works.

One problem I found was that this code threw an warning on my system when I tested it.
“Warning: preg_match() [function.preg-match]: Unknown modifier ‘t’ in …”.

The fix was to escape the / in </strong> like this <\/strong>

Somehow I always manage to forget escaping html tags like that when regexing them :shifty:

Thanks for pointing it out, and glad you’ve got a better idea on how it works :slight_smile:

Actually, what you’re looking for is the “match all non-greedy” construct, dot-star-question mark:

$a = "foo and <strong>bar</strong> and <strong>baz</strong>!";

preg_match_all('~<strong>(.*?)</strong>~', $a, $m);
print_r($m[1]); // prints bar, baz

the ([^<]+) thing won’t work for strings like "<strong>foo <i>bar</i></strong>.