Simple (I think) regular expression help

Hi Guys,

My regular expression knowledge is pretty much limited to a collection of snippets that l cut & paste. I can do simple stuff but get confused quickly when l try to read and understand more complex reg expression.

That said l’m trying to match something with preg_match_all() that l would assume is pretty simple to do. I want to match either a 0-9 number OR a blank space.

So for example say my data is:

$content = '<strong>Bedrooms: </strong>2</td>
<strong>Bedrooms: </strong>4</td>
<strong>Bedrooms: </strong> </td>
<strong>Bedrooms: </strong>1</td>';
preg_match_all('/<strong>Bedrooms: <\\/strong>([0-9])<\\/td>/ims', $content, $beds);

Returns:
$beds = array(2, 4, 1);

But l want to match the blank space as, l need this to return:
$beds = array(2, 4, ’ ', 1);

I can’t seem to figure it out, l’ve tried all sorts of different variations but can’t seem to get the regular expression to work. For instance:
([0-9]?\s)
([0-9]|\s)
([0-9|\s]) …

Any help would be greatly appreciated!

I think:

preg_match_all('/<strong>Bedrooms: <\\/strong>([0-9 ]*)<\\/td>/ims', $content, $beds);
  • is needed only if you also want to match:
<strong>Bedrooms: </strong></td>

(without the space before </td>)

Muchos gracious! I thought for sure l had tried that already… thanks man!

Just a quick little followup question… what is the difference between: (.+?) and (.*), l use them both frequently but l’m not exactly sure what the difference is.

Regex isn’t my strong side as well but as far as I know:

  • is zero or more occurences of the preceding character
  • is 1 or more occurences of the preceding character
    ? is 0 or 1 occurence of the preceding character
    . any character

So .+? would probably mean 0 or 1 occurences of 1 or more occurences of any character. To me it seems like exactly the same thing as .* (zero or more occurences of any character), just written differently.

But I’m not sure.

I use the cheat sheets from this site extensively.

Once you have something to look at, regex isn’t so hard.

A question mark means that whatever preceded it is optional.

With (.+?) the question mark means that an empty match is allowed. This means that the grouping will always be a successfully match, even if it means that it successfully captures no contents.
{EDIT}
The question mark actually means that the + is optional. This means that the grouping captures 1 or more characters.
{/EDIT}
With (.+)? the whole group is optional. This means that when a successful match does not occur, that the grouping capture fails.
With (.*) it will only be a successful match if it captures 1 (the dot) or more (the *) characters.

Compare (q?)b\1 which successfully matches b as against (q)?b\1 which fails to match b

(q?)b\1 is (successful and empty)b(successful and empty)
whereas
(q)?b\1 is (unsuccessful but optional)b(no backreference)

See the Backreferences to Failed Groups section of the Grouping and Backreferences page.

Off Topic:

Not true. In that case, an empty match is not allowed.

The question mark when used following quantifiers (*, +, {2,6}, etc.) flips the “greediness” of those quantifiers (by default, expressions are “greedy” so the ? makes the quantifiers “ungreedy”). Take for example a string containing “12345”. The regular expression \d+ will match 12345 whereas the regex \d+? is happy to match only 1. The latter will match the most minimal match that it can get away with so if we instead used the string 12345A with \d+A and \d+?A then both would match the full string.

Thanks, the laziness instead of greediness is the best way to understand the question mark, and is covered nicely on the Repitition page.

A good read of the Brackets page also helps to understand how that relates to capture groups, where the laziness aspect affects whether the brackets are mandatory or optional, and whether they are captured or not.

I think I was wrong with (.+?) though. It may be better to say that it captures 1 or more characters, whereas (.*) captures 0 or more characters.

[ot]Is that comment aimed at me or the OP? I only posted to correct your mistake. (:

Edit: pmw57’s post above originally pointed out that I might be wrong. God forbid! Alas, it was edited and all is good with the world. My original reply is quoted below. :eye:

P.S. I should really post less Off-Topic and more On-Topic replies here. I just keep butting in to threads with random tangents, like this one.[/ot]

Thanks for the correction.