SitePoint Sponsor

User Tag List

Results 1 to 5 of 5
  1. #1
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    29
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    know allowed characters from regular expression

    Hello everyone

    I have a regular expression to validate the url:

    /^(([\w]+?\/\/)?(([\d\w]|%[a-fA-f\d]{2,2})+([\d\w]|%[a-fA-f\d]{2,2})+)?@)?([\d\w][-\d\w]{0,253}[\d\w]\.)+[\w]{2,4}[\d]+)?(\/([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)*(\?(&?([-+_~.\d\w]|%[a-fA-f\d]{2,2})=?)*)?(#([-+_~.\d\w]|%[a-fA-f\d]{2,2})*)?$/

    What I want to know is which all characters are allowed in url using this regex. I am not that familiar with regular expressions. So any help is appreciated.

    Thnks

  2. #2
    Follow Me On Twitter: @djg gold trophysilver trophybronze trophy Dan Grossman's Avatar
    Join Date
    Aug 2000
    Location
    Philadephia, PA
    Posts
    20,580
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    This is the official specification of what makes a valid URL:
    http://www.faqs.org/rfcs/rfc1738.html

    What's allowed depends on what part of the URL you're at.

    Basically, the letters a-z, the digits 0-9, and plus ("+"), hyphen ("-") and dot (".") are allowed. A percent sign ("%") can appear as part of encoding a non-allowed character. I believe that's it.

  3. #3
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Tada!
    PHP Code:
    $xunressub     '\w\-.~\!$&\'()*+,;=';
    $xpchar        $xunressub ':@%';

    $xscheme       '([a-zA-Z][a-zA-Z\d+-.]*)';

    $xuserinfo     '((['  $xunressub '%]*)' .
                     
    '(:([' $xunressub ':%]*))?)';

    $xipv4         '(\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3})';
    $xipv6         '(\[([a-fA-F\d.:]+)\])';
    $xhost_name    '([a-zA-Z\d-.%]+)';

    $xhost         '(' $xhost_name '|' $xipv4 '|' $xipv6 ')';
    $xport         '(\d*)';
    $xauthority    '((' $xuserinfo '@)?' $xhost .
                     
    '?(:' $xport ')?)';

    $xslash_seg    '(/[' $xpchar ']*)';
    $xpath_authabs '((//' $xauthority ')((/[' $xpchar ']*)*))';
    $xpath_rel     '([' $xpchar ']+' $xslash_seg '*)';
    $xpath_abs     '(/(' $xpath_rel ')?)';
    $xapath        '(' $xpath_authabs '|' $xpath_abs .
                     
    '|' $xpath_rel ')';

    $xqueryfrag    '([' $xpchar '/?' ']*)';

    $xurl          '^(' $xscheme ':)?' .  $xapath '?' .
                     
    '(\?' $xqueryfrag ')?(#' $xqueryfrag ')?$'
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  4. #4
    SitePoint Enthusiast
    Join Date
    Nov 2008
    Posts
    29
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thnks for that.

    But when I checked the form I could see that + and % are not allowed. I basically need to understand the regular expression so that I can make any further additions or deletions.

  5. #5
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •