SitePoint Sponsor

User Tag List

Results 1 to 3 of 3

Thread: REGEX question.

  1. #1
    Patience... bronze trophy solidcodes's Avatar
    Join Date
    Jul 2006
    Location
    Philippines
    Posts
    933
    Mentioned
    3 Post(s)
    Tagged
    0 Thread(s)

    Arrow REGEX question.

    Hi guys

    I want to scrape emails addresses in my own website.
    http://czone01.com/

    using REGEX while i'm formulating my own REGEX.
    Please i dont need DOM this time i only need REGEX.

    Please assume that these email addresses will change someday.

    Thank you in advance,


    -warren

  2. #2
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    First of all, pretending that it's for your own site isn't really going to get you anywhere - we all know that scraping is for OTHER sites

    Maybe would have persuaded us if there was more than one email address on the entire site...

    I assume that you know how to use the preg functions.

    If you want a simple email pattern:
    PHP Code:
    $Regex "/[A-Z0-9._%+-]+@[A-Z0-9.-]+\.[A-Z]{2,4}/"
    Or if you want a better but more complicated pattern:
    PHP Code:
    $Regex "/a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*@(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?/"
    Source: http://www.regular-expressions.info/email.html
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  3. #3
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    61 Post(s)
    Tagged
    0 Thread(s)
    Capturing email addresses with Regular Expressions

    We all know that capturing email addresses with regular expressions can be tricky but I think, in this case at least, the following might be useful. If not, you can at least use it as a building block to tweak and make suitable for your particular needs.

    Description
    array capture_email( string $sSubject)
    Searches sSubject for email addresses present within it.

    Function Definition
    PHP Code:
    function capture_email($sSubject)
    {
        static 
    $sRegexFu '
            /
            
            (?# --
              # This pattern captures
              # only real and proper 
              # email address!
              # -- )
            
            
            \b(?:[\!"#\$%\'\(\)\*\+,\-\&
            \:;@                    \?\^
            _`  \{                \|  \}
            \]    \*             \?   \[
            \!      \#         $%     ])
            *\K  (?:  [a-z0-9]+  @[   {}
            &\' *+                 \/ =?
            ^`a-z~-]+(?:\.[a-z0-9]+)?)\b
            
            
            (?# -- End of Email Capture --)
            
            /x'
    ;

        
    // Return the array of emails (which may be empty!)
        
    if (preg_match_all($sRegexFu$sSubject$aEmails) !== FALSE)
            return 
    array_shift($aEmails);
        
        return 
    FALSE;

    Returns
    An array of email addresses present within the sSubject string (an empty array if none) or FALSE on error.

    Example
    PHP Code:
    print_r(
        
    capture_email('
            Get in touch at email@domain.com 
            but make sure not to be a silly 
            person by spamming nospam@domain.com!
        '
    )
    ); 
    The above example will output:
    Code:
    Array
    (
        [0] => email@domain.com
        [1] => nospam@domain.com
    )
    Off Topic:

    *giggle*
    Salathe
    Software Developer and PHP Manual Author.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •