SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Help with regex please...

    I have to look over a string and remove anything that's not within the following ranges:

    a-z
    A-Z
    0-9
    _ (underscore)

    So any punctuations, spaces, or generally non-filename-acceptable characters would be removed from the string. Underscores would be permitted, however.

    I know it only takes a few minutes with this regex stuff once you get the hang of it...but until then...your genius is greatly appreciated!



    Cranjled

  2. #2
    Who turned the lights out !! Mandes's Avatar
    Join Date
    May 2005
    Location
    S.W. France
    Posts
    2,496
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    ^[a-zA-Z0-9_]*$
    A Little Knowledge Is A Very Dangerous Thing.......
    That Makes Me A Lethal Weapon !!!!!!!!

    Contract PHP Programming

  3. #3
    SitePoint Zealot santanu's Avatar
    Join Date
    Oct 2003
    Location
    india
    Posts
    138
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    if(eregi('^[a-zA-Z0-9_]+$',$string))
    return true;
    else
    return false;

  4. #4
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cranjled
    I have to look over a string and remove anything that's not within the following ranges:

    a-z
    A-Z
    0-9
    _ (underscore)
    Try this
    PHP Code:
    preg_replace("~\W+~"''$a); 

  5. #5
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Thanks you guys! Sheesh...twenty characters can really get you bangin' your head... I really appreciate the responses.

    Question: Is one of these methods preferred over another? Any why?

    Thanks again!

  6. #6
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    stereofrog, that was exactly the trick I needed. It keeps letters, numbers and underscores and strips out all the rest. Most excellent!

    Thanks to all of you!

    Cranjled

  7. #7
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question

    Embarassingly, I can't figure out how to add the dot character to the allowed characters...and furthermore, I'd do a sad smily here, but I don't even know the right character combo for that...

    Using the example that stereofrog posted, worked nicely for the purpose, but now I need to add the dot character:

    Can someone amend the following so that it allows the dot character too? (Just that one character, though!)

    Code:
     preg_replace("~\W+~", '', $a);
    Any help is greatly appreciated!

    Thanks,

    Cranjled

  8. #8
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    PHP Code:
    preg_replace('~[^\w\.]+~'''$a); 
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  9. #9
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Boy, do I feel goofy ... !

    Thanks so much!



    Cranjled

  10. #10
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    This regex works nicely for the intended purposes. With a bit of tweaking it could maybe be used in something else that I'm trying.

    The following code removes many characters that are not needed:

    Code:
     preg_replace("~\W+~", '', $a);
    So, I decided to try it for image verification purposes. What I'm doing is reading an image file via file_get_contents(), and stripping out unneeded characters. The regex above does most of the work, but does leave the string with many unwanted characters (such as ÿ, Ø, à, ÿ, þ, etc...). I guess these /are/ technically word characters, but since they are unneeded, can the pattern be amended to remove any non-english-word characters?

    Essentially, the goal is to read the file in and determine if it is a "real" gif, jpg or png image. I have performed extension and mimetype checks, but these are easily spoofed. To add an additional layer of assuredness, I would like to verify the filetypes by actually reading them into a string and determining from there. Can this also be spoofed? (And if so, what IS the best method to assure a valid image where only an image should be?)

    Thanks so much for following this thread!

    - Cranjled

  11. #11
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  12. #12
    SitePoint Addict cranjled's Avatar
    Join Date
    Apr 2004
    Location
    ny
    Posts
    382
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)


    Now, that's just what the doctor ordered! Thanks much!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •