Preg_match to confirm letters/numbers/underscores only?

Hey, I’ve had people register with ^, *, and allt hese other weird characters.

I want to confirm on registration that it’s just letters, numbers, underscores, or dashes.

How can I do this?


Try this:

if((ereg("^[a-zA-Z0-9_\\-]+$", $your_variable_here)) {
# allowable characters only
} else {
# some bad characters found

This will say true/false for valid characters though it’ll not replace characters etc. I think the Reg Exp is okay as I’ve used it before bar the underscore and dash characters.

You may need to escape the underscore as well ? using /_ for example as I’ve not tested this.

I think there is no longer any reason to use ereg, use preg instead.

In brackets every character is literal IIRC, so you should not try to escape the -, instead make sure it is the last character in the class (as it already is in your example). By adding the \ you are also allowing the backslash as a valid character.

Wasn’t sure as I don’t allow the - character through my FORMS as this can be used to hack mySQL 8)

Like he said!

POSIX -> PCRE (Perl) Regex

I think this would serve: /[\w-]+/
(\w includes an underscore)

While you are at it, you could also use {6,12} style syntax instead of + to control the maximum length of the string.

Small nit: “\w” includes underscores, IIRC.

wouldn’t this work?

preg_match('#^[A-Za-z0-9_-]{3,20}$#s', $string);

And then for the password

preg_match('#^[A-Za-z0-9?+*_!#$%&-]{6,20}$#s', $string);


but would \w include any ‘special’ letters like we have here? áéðæþö ?

Heh… I think I got my edit in before you posted… that occured to me just after submit.

preg_match('#^[A-Za-z0-9_-]{3,20}$#s', $string);

Yes that would work! However, the #s is redundant since there are no periods in the regex. Also, \w happens to be a perfect replacement for 2/3 of that.

\w is [0-9A-Za-z_] … no funky characters :slight_smile:

Besides the underscore that was on my post for about 10 seconds, I did make another mistake … failed to put the ^ and $ in so that it only matched if it was the entire string.

ok, preg code verision 2 :wink:

preg_match('#^[\\w-]{3,20}$#', $string);


preg_match('#^[\\w?+*!#$%&-]{6,20}$#', $string);

wouldn’t that just be in ideal preg_match for those?

but about the \w

But is then 0-9 and _ thought of as ‘word’ ? and then again áéðíóúýþæö are not thought of as a part of word, that is ehm, a little bit wierd I think.

is \w maybe just an ‘alias’ for a-zA-Z0-9_ ? as you said ?

Actually, áéðíóúýþæö and 0-9 are matched by \w.

hmm, so that is not very ‘internet friendly’ to use \w, propably best to limit it to [a-zA-Z0-9_-] I think…

Why is it not ‘Internet friendly’? I thought it would be friendlier to accept such characters, considering the international nature of the Internet. Or am I misunderstanding you?

Granted, for 0-9, but as for the “special characters” I just tested and no match. Perhaps I’ve done something wrong, so here’s the test:

$source = 'áéðíóúýþæö';
if (preg_match('/\\w+/', $source))
echo ('match!');

No match for me.

Funny, because I’d tested this out to confirm too, with this code snippet:

$string = 'áéðíóúýþæö232';

if ( preg_match( '/^\\w+$/', $string) ) {
    echo 'Match!';

and it matches. Maybe it’s a locale issue.

Yes it is a locale issue.

  1. \w\xe6\xc6\xf8\xd8\xe5\xc5\xf6\xd6\xe4\xc4 ↩︎

Well, like you can’t have usernames here (I think) with icelandic letters and not in emails, bad to have them in files/directories names and many things, doesn’t it just offer trouble?

I was wondering, thanks for looking it up. That’s a good note to keep in mind for scripts that may travel.

Of course, if you are installing a script on a server with a locale that includes áéðíóúýþæ in \w, there is a good chance that you would want to accept those characters.

And how is it configured?