Help with Regular Expressions

Could someone help me with PHP Regular Expressions? (Or Perl or whatever they technically are called?!)

I’m trying to find a practical way to validate an e-mail address
in a registration form, and everyone makes Regular Expressions sound like the way to go?!

However the e-mail is validated, the “rules” should be strict but not so restrictive as to block out legitimate e-mails addresses. Below is my best guess at the rules…

A valid e-mail would likely have:

Letters [a-z][A-Z]
Numbers [0-9]
Underscores, Hyphens, Decimals

followed by an Ampersand [@]

followed by a finite set of top-level domains (e.g. .com, .net, .org)

So how would I do that in PHP using Regular Expressions?? (And is there a better approach??)

Sincerely,

Debbie

Your best bet here is to look up someone else’s regex for this, as e-mail addresses are probably much more complicated than you think. There are a lot of formats you might never see yourself but are valid and should be accepted.

http://www.regular-expressions.info/email.html

(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|"(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21\\x23-\\x5b\\x5d-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\\x01-\\x08\\x0b\\x0c\\x0e-\\x1f\\x21-\\x5a\\x53-\\x7f]|\\\\[\\x01-\\x09\\x0b\\x0c\\x0e-\\x7f])+)\\])

Not sure I follow you. That link looks like they are trying to sell me something?!

I want to be able to reasonably program the Regular Expressions myself in PHP and not have to buy anything.

Debbie

Why would you think that? Have you read that website? It’s actually one of the best (if not the best) source on regular expressions out there! :slight_smile:

What you quoted in my post WAS the regular expression. That was it, right here on the forum, from that website. Nothing to buy.

Is there a better approach?

If you are on PHP 5.2.0 or better then you could look at using the Filter functions, esp use of the [URL=“http://www.php.net/manual/en/filter.filters.validate.php”]validate email flag

unless something changed, I heard you shouldn’t really use or rely on those.

Crikey, where did that come from then?

There’s some pretty interesting information in the PHP source for the FILTER_VALIDATE_EMAIL flag.


void php_filter_validate_email(PHP_INPUT_FILTER_PARAM_DECL) /* {{{ */
{
    /*
     * The regex below is based on a regex by Michael Rushton.
     * However, it is not identical.  I changed it to only consider routeable
     * addresses as valid.  Michael's regex considers a@b a valid address
     * which conflicts with section 2.3.5 of RFC 5321 which states that:
     *
     *   Only resolvable, fully-qualified domain names (FQDNs) are permitted
     *   when domain names are used in SMTP.  In other words, names that can
     *   be resolved to MX RRs or address (i.e., A or AAAA) RRs (as discussed
     *   in Section 5) are permitted, as are CNAME RRs whose targets can be
     *   resolved, in turn, to MX or address RRs.  Local nicknames or
     *   unqualified names MUST NOT be used.
     *
     * This regex does not handle comments and folding whitespace.  While
     * this is technically valid in an email address, these parts aren't
     * actually part of the address itself.
     *
     * Michael's regex carries this copyright:
     *
     * Copyright © Michael Rushton 2009-10
     * http://squiloople.com/
     * Feel free to use and redistribute this code. But please keep this copyright notice.
     *
     */
    const char regexp[] = "/^(?!(?:(?:\\\\x22?\\\\x5C[\\\\x00-\\\\x7E]\\\\x22?)|(?:\\\\x22?[^\\\\x5C\\\\x22]\\\\x22?)){255,})(?!(?:(?:\\\\x22?\\\\x5C[\\\\x00-\\\\x7E]\\\\x22?)|(?:\\\\x22?[^\\\\x5C\\\\x22]\\\\x22?)){65,}@)(?:(?:[\\\\x21\\\\x23-\\\\x27\\\\x2A\\\\x2B\\\\x2D\\\\x2F-\\\\x39\\\\x3D\\\\x3F\\\\x5E-\\\\x7E]+)|(?:\\\\x22(?:[\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21\\\\x23-\\\\x5B\\\\x5D-\\\\x7F]|(?:\\\\x5C[\\\\x00-\\\\x7F]))*\\\\x22))(?:\\\\.(?:(?:[\\\\x21\\\\x23-\\\\x27\\\\x2A\\\\x2B\\\\x2D\\\\x2F-\\\\x39\\\\x3D\\\\x3F\\\\x5E-\\\\x7E]+)|(?:\\\\x22(?:[\\\\x01-\\\\x08\\\\x0B\\\\x0C\\\\x0E-\\\\x1F\\\\x21\\\\x23-\\\\x5B\\\\x5D-\\\\x7F]|(?:\\\\x5C[\\\\x00-\\\\x7F]))*\\\\x22)))*@(?:(?:(?!.*[^.]{64,})(?:(?:(?:xn--)?[a-z0-9]+(?:-[a-z0-9]+)*\\\\.){1,126}){1,}(?:(?:[a-z][a-z0-9]*)|(?:(?:xn--)[a-z0-9]+))(?:-[a-z0-9]+)*)|(?:\\\\[(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){7})|(?:(?!(?:.*[a-f0-9][:\\\\]]){7,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,5})?)))|(?:(?:IPv6:(?:(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){5}:)|(?:(?!(?:.*[a-f0-9]:){5,})(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3})?::(?:[a-f0-9]{1,4}(?::[a-f0-9]{1,4}){0,3}:)?)))?(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))(?:\\\\.(?:(?:25[0-5])|(?:2[0-4][0-9])|(?:1[0-9]{2})|(?:[1-9]?[0-9]))){3}))\\\\]))$/iD";

    pcre       *re = NULL;
    pcre_extra *pcre_extra = NULL;
    int preg_options = 0;
    int         ovector[150]; /* Needs to be a multiple of 3 */
    int         matches;


    /* The maximum length of an e-mail address is 320 octets, per RFC 2821. */
    if (Z_STRLEN_P(value) > 320) {
        RETURN_VALIDATION_FAILED
    }

    re = pcre_get_compiled_regex((char *)regexp, &pcre_extra, &preg_options TSRMLS_CC);
    if (!re) {
        RETURN_VALIDATION_FAILED
    }
    matches = pcre_exec(re, NULL, Z_STRVAL_P(value), Z_STRLEN_P(value), 0, 0, ovector, 3);

    /* 0 means that the vector is too small to hold all the captured substring offsets */
    if (matches < 0) {
        RETURN_VALIDATION_FAILED
    }

}
/* }}} */

It might be worth knowing it’s limitations/coverage. :slight_smile:

Are there any other known limitations with Filter functions or flags that we should know of then?

in terms of what? Like auto-validates? Anthony can probably speak more on this, but from posts I have seen here, it seems to me like filters are kind of only lipservice.

Sure, but that’s probably worth a whole new topic to save this email one going astray. (:

There is a pretty good package on pear called Validate
Here is the link to documentation related to email validation
http://pear.php.net/manual/en/package.validate.validate.email.php

I just scanned the page last night at like 11:00pm, but when I saw “Get your own copy of RegexBuddy now” combined with some Discover and GoogleAds and a webpage ending in .info I just assumed they were selling something?! :-/

Debbie

I don’t understand what that is that you just posted?!

Is that a Regular Expression?

And what were you getting at with your last comment?

Debbie

What is PEAR?

What do I need to do to use it?

How does your suggestion compare to using Regular Expression or that Filter thingy mentioned above?

Too many choices!

Debbie

So that sounds like the easier and more efficient way to go, right?

So, what would I need to use PEAR and that function (?) on my local computer during development?

What would I need on a web hosting account?

For a quick a dirty check of email address I would use the php’s built in filter_var like this
if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
// bad email
}

In my opinion this is good enough for just a syntax check of an address, quick, almost no code required.

Well, I am trying to create a registration system for my website, so I want to use e-mail as both a username and as a way to get in touch with people, so the e-mail really needs to be valid.

I don’t want to go crazy checking for every conceivable e-mail combination since most, if not all, customers would be in the U.S… But at the same time, I don’t want bad guys entering in things that could blow up my system or cuase me grief, if that makes sense?

Debbie

Pear is repository of classes written for php. It’s easier if you just read about it on their site
http://pear.php.net/
It can make your programming much easier if you know about pear classes since chances are the class already exists for something that you need to do.
Comparing to filter_var and regular expression using validate from pear Validate is much more powerful and gives you a much better validation since it goes beyond regular expression, you may also check that domain exists.

For a quick a dirty check of email address I would use the php’s built in filter_var like this
if(!filter_var($email, FILTER_VALIDATE_EMAIL)){
// bad email

}

In my opinion this is good enough for just a syntax check of an address, quick, almost no code required.

Indeed it is.

Getting at? :confused:

I was informing you that the built-in PHP function, whilst appropriate for 99%+ of cases, does have limitations which you should be aware of.

 * This regex does not handle comments and folding whitespace.  While
 * this is technically valid in an email address, these parts aren't
 * actually part of the address itself.

Nobody can make the decision for you, nor should they. You should take the recommendations offered by members, research them, then decide which is the most applicable for the project at hand.

I just didn’t understand what you meant by…

It might be worth knowing it’s limitations/coverage. :slight_smile:

I was informing you that the built-in PHP function, whilst appropriate for 99%+ of cases, does have limitations which you should be aware of.

Okay.

Nobody can make the decision for you, nor should they. You should take the recommendations offered by members, research them, then decide which is the most applicable for the project at hand.

So based on my last response, what do you think makes sense?

I like the PEAR idea and it seems pretty easy to use. (Regualr Expressions intimidate me.) :frowning:

Debbie