SitePoint Sponsor

User Tag List

Results 1 to 19 of 19
  1. #1
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Simple Regex Explanation

    Hi,

    I have the following regular expression:

    $emailPattern = '/^[a-zA-Z0-9_.-\@]{8,50}$/';

    Basically I want to allow all alphabetical and numerical characters, as well as underscores, dashes and dots, but I want to REQUIRE the @ symbol, as it's obviously necessary for a valid email.

    My question is, I've done it, but I don't understand how I achieved it.

    If using the pattern above I type in "@gtdhdxt" it works, but if I remove the @ it doesn't, which is fine.

    What I'm curious about is what makes the @ symbol required?
    There is no dash, underscore or even a dot (Which I'd like to make required), and it still validates.

    I'm not sure if I've explained myself correctly, but if anyone can offer me some advice I'd appreciate it.

  2. #2
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,046
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    The @symbol isn't required. gtdhdxt fails because it is less then 8 characters long. The expression is incorrect.

  3. #3
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ah, right you are, just checked it.
    Okay, so how do I make a character required?
    I know I could use \@+ (or @+, not sure if I need to escape it), but that would check for at least 1 occurance of @.

    I want it to only allow 1 occurrance of @, no more no less.

    It wouldn't be \@{1} would it?

    If so, how would I incorporate that into the rest of the pattern?


    $emailPattern = '/^([a-zA-Z0-9_.-]\@{1}){8,50}$/';

    Would that work?

  4. #4
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,046
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    /^[a-zA-Z0-9_.-]+?@[a-zA-Z0-9_.-]+$/

  5. #5
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Can you explain that pattern please.

    I don't quite understand why you've repeated part of the pattern or what purpose the placement of the + has

  6. #6
    SitePoint Enthusiast nrg_alpha's Avatar
    Join Date
    Dec 2008
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Writing an accurate email pattern is more complex than you think. Since I have not yet reached the 10 post mark, I cannot post a URL (after all, I am a spam bot).

    In google, do a search for 'iamcal + email parser' (without the quotes). The first or second entry is the iamcal site. When you go to that site, you will be greeted with the breakdown of the parser this guy wrote... scroll down to the bottom of the page for a 'simplified' version of the parser. If this is not good enough, there is a download link at the very bottom which will lead to a page offering different parsers... (you will want to click on the 'RFC 3696 Parser' link. This leads to the mother load email parser.

    But truth be told, for my stuff, I don't bother with the nitty-gritty parsers... I prefer a much more relaxed / flexible system that is relaxed enough to let even some odd ball ones through, yet strict enough that you can't get away with just anything..

    Something along the lines of:
    PHP Code:
    function validate_email($the_email){ 
        return(
    filter_var(filter_var($the_emailFILTER_SANITIZE_EMAIL),FILTER_VALIDATE_EMAIL))? true false 

    This system isn't bulletproof (nor is it meant to be). The point to this is that it may be in your better interest to either a) use a loose fisted system like that one, or b) if you do want to go the opposite way and go with a tight fisted system, go with one that is already built to be suitably good (like the one on the iamcal - publish) site. Otherwise, your pattern may disallow some legit email formats that you are not aware of.

  7. #7
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    That page you asked me to search for was a LOT of information to take in.

    A lot of it went right over my head.

    I just need to know how I can specify individual characters that must occur a specific amount of times exactly.
    In my case "@", I want it to appear only once, how can I do that?

    Have I correctly written that the following _.- are allowed but not required?

  8. #8
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    http://www.regular-expressions.info/

    Read it, give it patience and practice examples and practice writing your own stuff. You will understand it soon enough.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  9. #9
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think I've figured it out:

    $emailPattern = '/^[a-zA-Z0-9_.-]{8,50}@{1}$/';

    That works, but is it correct?

    Without a @ it doesn't validate, with 1 @ it does validate and with 2 @ it doesn't validate.
    So it seems to be doing exactly what I want it to, is there anything that I've missed?

  10. #10
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    That pattern will only match strings with the @ character at the end of the string, nowhere else.

    • email@ will match
    • email@domain.com will not match
    • email will not match


    Also, the {1} (quantifier) part of @{1} is redundant. Just @ will behave in exactly the same way.
    Salathe
    Software Developer and PHP Manual Author.

  11. #11
    SitePoint Wizard
    Join Date
    Nov 2005
    Posts
    1,191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It might help to write it in english and then look at docs for operators:
    starts with one or more of [these] chars | followed by one and one only @ | followed by one or more of [these] characters | ends with period followed by two or three of [these] characters

  12. #12
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Okay, well I'll give it a go:

    starts with and upper or lowercase letter or number ^a-zA-Z0-9
    May also contain -_.
    Must contain at least 1 period .
    Must contain only 1 @
    Must end with upper or lower case letters $a-zA-Z
    All between 8 and 50 characters.

    So:

    '/^[a-zA-Z0-9-_.] | @ | [a-zA-Z0-9-_.] | . | [a-zA-Z]{8,50}$/'

    That doesn't work exactly as expected either, it allows more than 1 instance of @.

    Theres the + that specifies 1 or more, you'd think there would be some syntax for just 1 regardless of it's position.

    |What's wrong with what I've written above?

  13. #13
    SitePoint Wizard
    Join Date
    Nov 2005
    Posts
    1,191
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Sorry, didn't mean to put you wrong, the | means "or", I was just using it to separate clauses in english, not code.

    starts with and upper or lowercase letter or number ^a-zA-Z0-9
    May also contain -_.
    those should be lumped in []+. Which means "contains only any of these", as in [xyz]+ contains at least one x or one y or one z

    Must contain at least 1 period .
    That's a bit trickier, but in this case, think of "followed by". There is only one place that is mandatory for a period in an email, 2 or 3 characters from the end.

    Must contain only 1 @
    @ by itself (or any other character eg A or \.) means one and one only, you need to use modifiers like + to change that, again, it's position rather than just one.

    Must end with upper or lower case letters $a-zA-Z
    [a-zA-Z]$ is fine, but with domain names, you can be more specific: \.[a-zA-Z]{2,3} - which reads a period followed by two or three alphabetic characters.

    All between 8 and 50 characters.
    (expression that must be 8-50 chars){8, 50}

    regex are painful, I hope the above helped.

  14. #14
    SitePoint Addict
    Join Date
    Aug 2007
    Posts
    328
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thankyou, you were quite helpful.

    Here's something I found after a bit of searching:

    ^[a-zA-Z0-9._-]+@[a-zA-Z0-9_-]+\.[a-zA-Z.]{2,5}$

    I don't fully understand the +

    What I gather is that it means any letter, number, ., _ or - then a @ then any letter, number, _ or -, then a ., then any letter and a period, and those last bunch of characters can only be between 2 and 5 characters in length.

    So if I understand correctly, the plus symbol ONLY means 1 or more and has nothing to do with concatenation or addition?

    And there is no need to seperate individual parts of the pattern by any characters.

    What if I wanted the first part before the @ to be only 10 characters long, would it simply be:

    ^[a-zA-Z0-9._-]{10}

    as the first part? (If I only wanted to EXACTLY match 8 characters length)

  15. #15
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Yes, and between 2 and 10 would be {2,10}.

    You are correct, The + isn't concatenation; It means one or more of the preceding command.

    For example:
    Code:
    /a+/
    Would match a, aa, aaa, aaaa etc
    Code:
    /abc+/
    Would match abc, abcc, abccc but not abbc.
    Code:
    /[abcd]+/
    would match a, b, c, d, aa, ab, ac, ad, ba, bb, bc, bd, ca, cb, cc, cd, da, db, dc, dd etc, upto theoretically infinite combinations.
    Code:
    /[ab]{3}/
    would match only one of:

    • aaa
    • aab
    • aba
    • abb
    • baa
    • bab
    • bba
    • bbb


    ? works just like + but means ONE or ZERO, i.e:
    Code:
    /My name is ([A-Z\s]+).?/
    Means:
    'My name is ' followed by multiple occurances of (any letter or a space) followed by ONE or ZERO full stops.

    In other words, it matches 'My name is Dave Smith' and 'My name is Dave Smith.' as if they were the same thing.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  16. #16
    SitePoint Enthusiast nrg_alpha's Avatar
    Join Date
    Dec 2008
    Posts
    81
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by arkinstall View Post
    Code:
    /My name is ([A-Z\s]+).?/
    Means:
    'My name is ' followed by multiple occurances of (any letter or a space) followed by ONE or ZERO full stops.

    In other words, it matches 'My name is Dave Smith' and 'My name is Dave Smith.' as if they were the same thing.
    This is not correct.. If the string is say 'My name is Dave Smith', it will only match: 'My name is Da'

    There are inaccuracies with this description. What this pattern is actually saying is;

    match: 'My name is ', followed by any uppercase letters or any whitespace characters (which could include a tab, return carriage, space or newline for example) one or more times consecutively, followed by a dot_match_all wildcard (which matches any single character other than a new line by default), of which is optional (zero or one time).

    I suspect what you intended is:
    Code:
    /My name is ([a-z\s]+)\.?/i
    note the escaped dot (which now means look for a literal dot, and the i modifier after the closing delimiter (which makes any letters within the pattern case insensitive).

  17. #17
    SitePoint Enthusiast
    Join Date
    Sep 2005
    Posts
    68
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Try these two classes:

    Email validation based on the RFCs for email and domain syntaxes.
    http://code.google.com/p/php-email-address-validation/

    Email validation via the email server (SMTP):
    http://code.google.com/p/php-smtp-email-validation/
    Fiji Web Design - Enterprise Web Design

  18. #18
    SitePoint Wizard lorenw's Avatar
    Join Date
    Feb 2005
    Location
    was rainy Oregon now sunny Florida
    Posts
    1,094
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Not sure if you have seen this but it saves me a lot of grief building regex.

    http://www.addedbytes.com/cheat-sheets/

    Scroll down to the regex cheatsheet.
    What I lack in acuracy I make up for in misteaks

  19. #19
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by nrg_alpha View Post
    I suspect what you intended is:
    Code:
    /My name is ([a-z\s]+)\.?/i
    Spot on.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •