Help with Regular Expressions

DoubleDee · October 27, 2010, 4:40am

Could someone help me with PHP Regular Expressions? (Or Perl or whatever they technically are called?!)

I’m trying to find a practical way to validate an e-mail address in a registration form, and everyone makes Regular Expressions sound like the way to go?!

However the e-mail is validated, the “rules” should be strict but not so restrictive as to block out legitimate e-mails addresses. Below is my best guess at the rules…

A valid e-mail would likely have:

Letters [a-z][A-Z]
Numbers [0-9]
Underscores, Hyphens, Decimals

followed by an Ampersand [@]

followed by a finite set of top-level domains (e.g. .com, .net, .org)

So how would I do that in PHP using Regular Expressions?? (And is there a better approach??)

Sincerely,

Debbie

Dan_Grossman · October 27, 2010, 4:44am

Your best bet here is to look up someone else’s regex for this, as e-mail addresses are probably much more complicated than you think. There are a lot of formats you might never see yourself but are valid and should be accepted.

http://www.regular-expressions.info/email.html

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])

DoubleDee · October 27, 2010, 5:07am

Dan_Grossman:

Your best bet here is to look up someone else’s regex for this, as e-mail addresses are probably much more complicated than you think. There are a lot of formats you might never see yourself but are valid and should be accepted.

http://www.regular-expressions.info/email.html
(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])

Not sure I follow you. That link looks like they are trying to sell me something?!

I want to be able to reasonably program the Regular Expressions myself in PHP and not have to buy anything.

Debbie

rpkamp · October 27, 2010, 6:16am

Why would you think that? Have you read that website? It’s actually one of the best (if not the best) source on regular expressions out there!

Dan_Grossman · October 27, 2010, 8:24am

What you quoted in my post WAS the regular expression. That was it, right here on the forum, from that website. Nothing to buy.

Cups · October 27, 2010, 1:25pm

Is there a better approach?

If you are on PHP 5.2.0 or better then you could look at using the Filter functions, esp use of the [URL=“http://www.php.net/manual/en/filter.filters.validate.php”]validate email flag

rguy84 · October 27, 2010, 1:30pm

unless something changed, I heard you shouldn’t really use or rely on those.

Cups · October 27, 2010, 1:32pm

Crikey, where did that come from then?

AnthonySterling · October 27, 2010, 1:34pm

There’s some pretty interesting information in the PHP source for the FILTER_VALIDATE_EMAIL flag.


void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
    /*
     * The regex below is based on a regex by Michael Rushton.
     * However, it is not identical.  I changed it to only consider routeable
     * addresses as valid.  Michael's regex considers a@b a valid address
     * which conflicts with section 2.3.5 of RFC 5321 which states that:
     *
     *   Only resolvable, fully-qualified domain names (FQDNs) are permitted
     *   when domain names are used in SMTP.  In other words, names that can
     *   be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
     *   in Section 5) are permitted, as are CNAME RRs whose targets can be
     *   resolved, in turn, to MX or address RRs.  Local nicknames or
     *   unqualified names MUST NOT be used.
     *
     * This regex does not handle comments and folding whitespace.  While
     * this is technically valid in an email address, these parts aren't
     * actually part of the address itself.
     *
     * Michael's regex carries this copyright:
     *
     * Copyright © Michael Rushton 2009-10
     * http://squiloople.com/
     * Feel free to use and redistribute this code. But please keep this copyright notice.
     *
     */
    const char regexp[] = "/^(?!(?:(?:\\\\x22?\\\\x5C[\\\\x00-\\\\x7E]\\\\x22?)|(?:\\\\x22?[^\\\\x5C\\\\x22]\\\\x22?)){255,})(?!(?:(?:\\\\x22?\\\\x5C[\\\\x00-\\\\x7E]\\\\x22?)|(?:\\\\x22?[^\\\\x5C\\\\x22]\\\\x22?)){65,}@)(?:(?:[\\\\x21\\\\x23-\\\\x27\\\\x2A\\\\x2B\\\\x2D\\\\x2F-\\\\x39\\\\x3D\\\\x3F\\\\x5E-\\\\x7E]+)|(?:\\\\x22(?:[\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21\\\\x23-\\\\x5B\\\\x5D-\\\\x7F]|(?:\\\\x5C[\\\\x00-\\\\x7F]))*\\\\x22))(?:\\\\.(?:(?:[\\\\x21\\\\x23-\\\\x27\\\\x2A\\\\x2B\\\\x2D\\\\x2F-\\\\x39\\\\x3D\\\\x3F\\\\x5E-\\\\x7E]+)|(?:\\\\x22(?:[\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21\\\\x23-\\\\x5B\\\\x5D-\\\\x7F]|(?:\\\\x5C[\\\\x00-\\\\x7F]))*\\\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\\\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\\\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\\\]))$/iD";

    pcre       *re = NULL;
    pcre_extra *pcre_extra = NULL;
    int preg_options = 0;
    int         ovector[150]; /* Needs to be a multiple of 3 */
    int         matches;


    /* The maximum length of an e-mail address is 320 octets, per RFC 2821. */
    if (Z_STRLEN_P(value) > 320) {
        RETURN_VALIDATION_FAILED
    }

    re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC);
    if (!re) {
        RETURN_VALIDATION_FAILED
    }
    matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);

    /* 0 means that the vector is too small to hold all the captured substring offsets */
    if (matches < 0) {
        RETURN_VALIDATION_FAILED
    }

}
/* }}} */

It might be worth knowing it’s limitations/coverage.

Cups · October 27, 2010, 2:37pm

Are there any other known limitations with Filter functions or flags that we should know of then?

rguy84 · October 27, 2010, 2:42pm

in terms of what? Like auto-validates? Anthony can probably speak more on this, but from posts I have seen here, it seems to me like filters are kind of only lipservice.

salathe · October 27, 2010, 3:11pm

Sure, but that’s probably worth a whole new topic to save this email one going astray. (:

lampcms_com · October 27, 2010, 3:40pm

There is a pretty good package on pear called Validate
Here is the link to documentation related to email validation
http://pear.php.net/manual/en/package.validate.validate.email.php

DoubleDee · October 27, 2010, 4:24pm

I just scanned the page last night at like 11:00pm, but when I saw “Get your own copy of RegexBuddy now” combined with some Discover and GoogleAds and a webpage ending in .info I just assumed they were selling something?! :-/

Debbie

DoubleDee · October 27, 2010, 4:30pm

AnthonySterling:

There’s some pretty interesting information in the PHP source for the FILTER_VALIDATE_EMAIL flag.


void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
    /*
     * The regex below is based on a regex by Michael Rushton.
     * However, it is not identical.  I changed it to only consider routeable
     * addresses as valid.  Michael's regex considers a@b a valid address
     * which conflicts with section 2.3.5 of RFC 5321 which states that:
     */

It might be worth knowing it’s limitations/coverage.

I don’t understand what that is that you just posted?!

Is that a Regular Expression?

And what were you getting at with your last comment?

Debbie

DoubleDee · October 27, 2010, 4:36pm

What is PEAR?

What do I need to do to use it?

How does your suggestion compare to using Regular Expression or that Filter thingy mentioned above?

Too many choices!

Debbie

DoubleDee · October 27, 2010, 5:51pm

So that sounds like the easier and more efficient way to go, right?

So, what would I need to use PEAR and that function (?) on my local computer during development?

What would I need on a web hosting account?

For a quick a dirty check of email address I would use the php’s built in filter_var like this
if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
// bad email
}

In my opinion this is good enough for just a syntax check of an address, quick, almost no code required.

Well, I am trying to create a registration system for my website, so I want to use e-mail as both a username and as a way to get in touch with people, so the e-mail really needs to be valid.

I don’t want to go crazy checking for every conceivable e-mail combination since most, if not all, customers would be in the U.S… But at the same time, I don’t want bad guys entering in things that could blow up my system or cuase me grief, if that makes sense?

Debbie

lampcms_com · October 27, 2010, 5:36pm

Pear is repository of classes written for php. It’s easier if you just read about it on their site
http://pear.php.net/
It can make your programming much easier if you know about pear classes since chances are the class already exists for something that you need to do.
Comparing to filter_var and regular expression using validate from pear Validate is much more powerful and gives you a much better validation since it goes beyond regular expression, you may also check that domain exists.

For a quick a dirty check of email address I would use the php’s built in filter_var like this
if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
// bad email

}

In my opinion this is good enough for just a syntax check of an address, quick, almost no code required.

AnthonySterling · October 27, 2010, 5:42pm

Indeed it is.

Getting at?

I was informing you that the built-in PHP function, whilst appropriate for 99%+ of cases, does have limitations which you should be aware of.

 * This regex does not handle comments and folding whitespace.  While
 * this is technically valid in an email address, these parts aren't
 * actually part of the address itself.

Nobody can make the decision for you, nor should they. You should take the recommendations offered by members, research them, then decide which is the most applicable for the project at hand.

DoubleDee · October 27, 2010, 5:54pm

I just didn’t understand what you meant by…

It might be worth knowing it’s limitations/coverage.

I was informing you that the built-in PHP function, whilst appropriate for 99%+ of cases, does have limitations which you should be aware of.

Okay.

Nobody can make the decision for you, nor should they. You should take the recommendations offered by members, research them, then decide which is the most applicable for the project at hand.

So based on my last response, what do you think makes sense?

I like the PEAR idea and it seems pretty easy to use. (Regualr Expressions intimidate me.)

Debbie

Topic		Replies	Views
Regex for email validation PHP	1	189	April 13, 2010
Validate Email address with PHP PHP	10	1546	February 15, 2010
Need form to accept capitals PHP	10	713	September 10, 2011
PHP Regular Expression Related PHP	3	1994	March 13, 2015
Complex Regex Help PHP	6	819	October 8, 2014

Help with Regular Expressions

Related topics