SitePoint Sponsor

User Tag List

Results 1 to 9 of 9
  1. #1
    ********* Callithumpian silver trophy freakysid's Avatar
    Join Date
    Jun 2000
    Location
    Sydney, Australia
    Posts
    3,798
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Regular expression help pls :-)

    Hi, I am processing a string representing a url from user input. I want to try and as much as possible to validate whether it appears to be in the form of a url. Note that I have already stripped any leading "http://" from the url.

    Some examples of what I want:

    GOOD:
    www.mydomain.com
    my_domain.com.au
    house-of-cyber-giggles.net
    mydomain.com/somefile.php?foo=yin&bar=yang
    www.mydomain.co.uk/somefile.cgi?foo=yin+yang

    BAD:
    w#%^ww.badurl.com
    www.badurl

    Here is what I have:
    PHP Code:
    if ( ! eregi("^[_\\.0-9a-z-]+\\.+([a-z]{2,3})"$txtUrl) ) {
       
    // not a validly formed url

    Problem is that this validates as I want it to except that it does not match any illegal characters after the first '.' Eg;

    w?ww.bad.com -> rejected
    www.b?ad.com -> not rejected

    One day I will learn regex properly :P In the meantime any help pls?
    Last edited by freakysid; Jun 10, 2001 at 23:09.

  2. #2
    SitePoint Wizard
    Join Date
    Mar 2001
    Posts
    3,537
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Your regex looks for either of the characters in the brackets one or more times to start the string, and then it looks for a period(\.+), so the string:

    www.

    matches the first two parts of your regexp: it found one of the characters in the brackets("w") one or more times, and then it found the period. Then your regexp fails to find the 2-3 characters a-z immediately after the period(because of the question mark), so there is no match.
    Last edited by 7stud; Jun 10, 2001 at 23:48.

  3. #3
    ********* Callithumpian silver trophy freakysid's Avatar
    Join Date
    Jun 2000
    Location
    Sydney, Australia
    Posts
    3,798
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    OK then, I guess what I would like to have is a regex that does this. Starting at the begining of the string:

    match any of the characters in the set [_0-9a-z-] any number of times,
    followed by a period,
    possibly followed by more characters in the set [_0-9a-z-] followed by another period,
    followed by two or three characters in the set [_0-9a-z-],
    (and what-ever else follows doesn't need to be validated).

  4. #4
    Dumb PHP codin' cat
    Join Date
    Aug 2000
    Location
    San Diego, CA
    Posts
    5,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    freakysid, I think this will work for you, it hasn't been through much testing and I just wrote it this am, so no guarantee it will work for every instance, but it worked on each of your examples.

    PHP Code:
                 /* Returns 1 for okay and 0 for not */
        
    function check_string($str) {
            return 
    ereg("^[www|[:alnum:]?]+\\.?[[:alnum:]]+\\.[[:alnum:]]{2,3}"$str);
            }

                 
    //Example
        
    print check_string("mydomain.com/somefile.php?foo=yin&bar=yang"); 
    Last edited by freddydoesphp; Jun 11, 2001 at 09:50.
    Please don't PM me with questions.
    Use the forums, that is what they are here for.

  5. #5
    SitePoint Addict kunal's Avatar
    Join Date
    Oct 2000
    Posts
    307
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hmm.. i was wondering the same thing.. ive been trying to strip everything out of the url.. everything excpet for domain-name.com

    any ideas on how to do it?


    thanx,
    kunal
    i dunno...

  6. #6
    Dumb PHP codin' cat
    Join Date
    Aug 2000
    Location
    San Diego, CA
    Posts
    5,460
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Kunal yours is very similar and can be accomplished

    PHP Code:
        function return_servername($str) {
            
    eregi("^[www|[:alnum:]?]+\\.?([[:alnum:]]+\\.[[:alnum:]]{2,3})"$str$args);
            return 
    $args[1];
            }


        print 
    return_servername("turbo.mydomain.com/somefile.php?foo=yin&bar=yang"); 
    Please don't PM me with questions.
    Use the forums, that is what they are here for.

  7. #7
    SitePoint Addict kunal's Avatar
    Join Date
    Oct 2000
    Posts
    307
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    hey,
    thanx.. it worked like a charm

    kunal
    i dunno...

  8. #8
    ********* Callithumpian silver trophy freakysid's Avatar
    Join Date
    Jun 2000
    Location
    Sydney, Australia
    Posts
    3,798
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks very much for your replies. It is way past my bedtime right now. So I will try your solutions in the morning. Thanks again.

  9. #9
    ********* Callithumpian silver trophy freakysid's Avatar
    Join Date
    Jun 2000
    Location
    Sydney, Australia
    Posts
    3,798
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Here is an update on the url validation caper:

    1) Firstly, I've decided to strip off the requist_uri portion of the url as I don't want to validate this (don't know why I didn't do this from the beginning).

    2) Can't remember the exact details, but the regex supplied by freddy didn't pass all tests . It would correctly reject w$w.mydomain.com but would allow www.my$domain.com

    3) If forgot to mention that I wanted a legal url include those with sub domains or sub-sub domains, etc.

    Here is what I've ended up with. I've split the test into two sperate calls to ereg() Seems to do the job

    ereg("^[[:alnum:]]+(\.[[:alnum:]]+)+$", $urlArr[0])
    &&
    ereg("[[:alnum:]]+\.[[:alnum:]]{2,3}$", $urlArr[0])

    LOL, now I've been thinking about whether I should allow urls that are IP addresses. I think there is no reason not to in my application. So, I will probably just discard the second call to ereg() in that case


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •