Regex, [A-z] might not mean what you think!

Ok, so for a long time I have been under the impression that this:

[1]+$

$string = 'Some text here';
$correct = (preg_match('/^[A-z ]+$/m', $string) == 1);

is shorter and easier but equivalent to this:

[2]+$

$string = 'Some text here';
$correct = (preg_match('/^[A-Za-z ]+$/m', $string) == 1);

But it turns out that is isn’t at all the same, and at least one non-letter and non-space character sets $correct to true, the one I found was “[B][3]”.

There is probably something official written somewhere about this, but I haven’t managed to find it. Just thought I probably am not the only one who thought these two expressions were equivalent, so thought a thread might be helpful, especially if someone in the community can shed light on what is happening here?

Regards
-RT-


  1. A-z ↩︎

  2. A-Za-z ↩︎

  3. /B ↩︎

Yep, that is correct. If you visit www.asciitable.com, you will see why.

The values between A and lower case z are all included in that range. So that includes [ \ ] ^ _ and `

AH! THAT MAKES PERFECT SENSE NOW!

I didn’t realise that it was a range fromthe ascii table, so I can do [#-&] which would allow the characters #, $, % and &

Thats the first time I’ve learnt a hugely useful thing in ages, thanks cpradio!