How best to match a preg_match() to an HTML5 input pattern

I am trying to validate a password field as follows -

Must contain at least 1 upper, at least 1 lower, at least one number and at least one of a limited selection of special characters. It must also be between 8 and 30 in length

I want to do this both client side for speed and good user experience (HTML5) and server side for increased security (PHP).

Both the following work, but I am not sure why they are so different or if they can be improved / rationalised / corrected.

Client side HTML5 = pattern="^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%^&*_]).{8,30}$"

Server side PHP = if(!preg_match('/^(?=.*\d)(?=.*[A-Z])(?=.*[a-z])(?=.*[!@#$%_])[0-9A-Za-z!@#$%_]{8,30}$/', $string)) {

Now I am not adept at regular expressions but it seems to me that on the server side the last section of the regular expression [0-9A-Za-z!@#$%_] (just before defining the length constraints) should be redundant and it seems it is not required for the HTML5 pattern but the PHP preg_match() fails without - no error message but it does not validate correctly.

Also the server side requires no point . before the length constraints {8,30} whereas the HTML5 does .{8,30}

I just need these two to be, well, … better :smiley:

Cheers guys

"^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%^&*_]).{8,30}$"

Breakdown
^
start of line

(?=.*[a-z])
?= lookahead matches any character .* up to and including a lower-case character [a-z].

(?=.*[A-Z])
lookahead matches any character up to and including an upper-case character [A-Z].

(?=.*[0-9])
lookahead matches any character up to and including a digit [0-9].

(?=.*[!@#$%^&*_])
lookahead matches any character up to and including any one of the following characters [!@#$%^&*_]

lookahead (lookaround)

Quoting from https://www.regular-expressions.info/lookaround.html

The difference is that lookaround actually matches characters, but then gives up the match, returning only the result: match or no match. That is why they are called “assertions”. They do not consume characters in the string, but only assert whether a match is possible or not.

So in effect the lookaheads scout ahead first just to check if these matches are in the string.

Finally

If the above lookaheads are true, then starting from the beginning of the string we go ahead and try and match any character between 8 to 30 characters.

.{8,30}
matches any character . 8 to 30 times {8,30}

$
end of line

Note: When I said the lookahead matches any character up to and including, that isn’t strictly true .* is called greedy. It will match the whole text and then backtrack. Here is a good explanation https://javascript.info/regexp-greedy-and-lazy.

I had a go at this myself and came up with the following
^(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=\D*\d)(?=.*?[!@#$%^&*_]).{8,30}$

I used negated character sets e.g. [^a-z] and a non-greedy variant of any character multiple times .*?

Just a couple of points to be sure - I am not using javascript and I need syntax for HTML pattern and PHP preg_match to do same job. Are you saying this syntax will work for both ?

I tested my regular expressions using regex101

You can click on different flavours PHP, Javascript etc. Note there isn’t any HTML pattern option, so you will have to investigate that for yourself.

The negated and non-greedy variants are more performant requiring less steps in the match than the greedy variants.

I would recommend checking out the links I included in my post.

You are correct the . any character does the trick. The special character sets do appear to be different though. The HTML version seems to look for ^&* where as the PHP version doesn’t. The php also uses the shorthand \d for digit instead of [0-9]

I really appreciate your efforts and time and I will investigate your ideas further, and you have given me an alternative for my server side PHP, thank you. But My original question stated I have two working options (1 for PHP and 1 for HTML) but they appear to be different even though both are meant to be regular expressions.

So anybody out there - is my HTML ok and is it a fact that the syntax is different for the two expressions?

@kerry14 See above, I replied to that question as you were typing.

Edit: I also gave you a link to regex101 where you can at the very least test your html version, with the php option clicked to verify if it work in PHP

Thanks, just seen it. So can anybody else give me opinion if HTML code is good, or is there a better way? Thanks

I will leave it at this.

Testing with the following string aBc_eFgh9

Your HTML version took 33 steps
^(?=.*[a-z])(?=.*[A-Z])(?=.*[0-9])(?=.*[!@#$%^&*_]).{8,30}$

Mine took 23 steps
^(?=[^a-z]*[a-z])(?=[^A-Z]*[A-Z])(?=\D*\d)(?=.*?[!@#$%^&*_]).{8,30}$

Sorry, I am confused, so you ARE saying yours will work with HTML. Because you said before

So guys, back to the original question, can anyone supply me with a PHP preg_match AND an HTML 5 pattern to match the criteria I first listed in the original question - thanks

What did you try @kerry14?

@rpg_digital
Hi ya
Been a bit busy and both my original attempts - html 5 and php are working, but I intend to try your version on both the html 5 input pattern and the php preg_match() within the next couple of days and will get back to you.
Thanks for following up and thanks again for all your help and effort :grinning:

1 Like

This topic was automatically closed 91 days after the last reply. New replies are no longer allowed.