SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Wizard WorldNews's Avatar
    Join Date
    Nov 2007
    Posts
    1,033
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Php function to tell us whether a String is made of all acceptable letters & numbers

    Hello,

    Is there a Php function or code that you can recommend which code will be given a string
    and if this string is made up of valid letters & numbers then it returns True otherwise False.

    Keeping in mind that the string may contain German Umlat characters such as found in:
    Begrüßung, it may also contain similar French, Italian & Spanish characters.

    So basically if the string contains any of the non Word like characters such as:
    , ; / ? " ' : @#$%^&*()-_=+!`~\|<>.,\

    we want it to return False otherwise return True, with blank space being OK.

    Regards,

    Anoox search engine volunteer

    www.anoox.com

  2. #2
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi WorldNews,

    This function that uses preg_match($patter, $string) seems to do what you need.

    PHP Code:
    $string 'This has a SOB!';
    echo 
    isStringValid($string);
    function 
    isStringValid($string){
       
    $pattern "/[\*\]|[,]|[;]|[\/]|[\?]|[\"]|[\']|[:]|[@]|[#]|[$]|[%]|[\^]|[&]|[\*]|[\(]|[\)]|[\-]|[_]|[=]|[\+]|[!]|[\`]|[~]|[\\\]|[\!]|[\|]|[<]|[>]|[\.][\,]/";
       
    $has_banned_char preg_match($pattern$string);
       if(
    $has_banned_char == 0){
           return 
    0;
       } else {
           return 
    1;
       }
    }
    /* Outputs 1 for having a banned char */
    $string 'This has a SOB';
    echo 
    isStringValid($string);
    /* Outputs 0 for not having a banned char */

    The only char I am not sure about is '\' as it blew up my script when trying this out it is currently using [\\\] but I am not sure this is right?

    Hope this helps.

    Steve 
    ictus==""

  3. #3
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    You can approach this two ways; either (1) check if all characters in the input are allowed and raise an error if they're not, or (2) check if there are characters in the input that are not allowed and raise an error when they are.

    I usually opt for option 1 because it's easier to control and in general there are a lot less characters you don't want then characters you do want.

    So it's usually easiest to grab yourself an ASCII table, find the characters you want to allow, and filter using those ranges.

    PHP Code:
    function checkInput($input) {
        
    $allowed = array(
            array(
    3232), // space
            
    array(4857), // 0-9
            
    array(6590), // A-Z
            
    array(97122), // a-z
            
    array(192214), // Ŕ-Ö
            
    array(216246), // Ř-ö
            
    array(248255), // ř-˙
        
    );

        foreach(
    str_split($input) as $char) {
            
    $ord ord($char);
            foreach(
    $allowed as $range) {
                list(
    $begin$end) = $range;
                if (
    $ord >= $begin && $ord <= $end) {
                    continue 
    2;
                }
            }
            return 
    false;
        }
        
        return 
    true;

    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  4. #4
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    @ServerStorm ;

    You don't need to put those characters in character classes, and the | are superfluous for this purpose too.

    This would also work:
    PHP Code:
    "/\*,;\/\?\"\':@#\$%\^&\(\)\-_=+!\`~\\\\!\|<>\./" 


    Oh, and for a backslash you need four backslashes in the expression, \\\\ (I forgot why)
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  5. #5
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Thanks ScallioXTX,

    I am just learning how to use RegEx, so I have been challenging myself to try to help, but alas I cannot yet be an expert in this area. I originally tried something similar to what you recommend, however preg_math() threw an error. After learning that the \ character causes problems, I realize that it was most likely the culprit and if I had fixed it then it should have worked.

    With that said, I like your recommended way of 'accepted values' as it is more than likely more efficient then the regex.

    Thanks for taking the time to teach a little

    Regards,
    Steve
    ictus==""

  6. #6
    SitePoint Wizard WorldNews's Avatar
    Join Date
    Nov 2007
    Posts
    1,033
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    I tried your idea, but it does not work regarding German, French, etc. letters which are not English letters,
    such as for example:

    Begrüßung

    causes your code to give False message about this word as having bad characters, but of course those are
    all good German characters.

    Any suggestions to get around this short coming?


    Quote Originally Posted by ScallioXTX View Post
    You can approach this two ways; either (1) check if all characters in the input are allowed and raise an error if they're not, or (2) check if there are characters in the input that are not allowed and raise an error when they are.

    I usually opt for option 1 because it's easier to control and in general there are a lot less characters you don't want then characters you do want.

    So it's usually easiest to grab yourself an ASCII table, find the characters you want to allow, and filter using those ranges.

    PHP Code:
    function checkInput($input) {
        
    $allowed = array(
            array(
    3232), // space
            
    array(4857), // 0-9
            
    array(6590), // A-Z
            
    array(97122), // a-z
            
    array(192214), // Ŕ-Ö
            
    array(216246), // Ř-ö
            
    array(248255), // ř-˙
        
    );

        foreach(
    str_split($input) as $char) {
            
    $ord ord($char);
            foreach(
    $allowed as $range) {
                list(
    $begin$end) = $range;
                if (
    $ord >= $begin && $ord <= $end) {
                    continue 
    2;
                }
            }
            return 
    false;
        }
        
        return 
    true;


    Anoox search engine volunteer

    www.anoox.com

  7. #7
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Hi,

    Using ScallioXTX's regex pattern it the following strings did not work, but using my original regex they do:
    PHP Code:
    $string 'Begrüßung@work';
    echo 
    isStringValid($string);
    function 
    isStringValid($string){
       
    $pattern "/\*,;\/\?\"\':@#\$%\^&\(\)\-_=+!\`~\\\\!\|<>\./";
       
    $has_banned_char preg_match($pattern$string);
       if(
    $has_banned_char == 0){
           return 
    1// is valid
       
    } else {
           return 
    0// is not valid
       
    }

    This did not work, but this did:
    PHP Code:
    $string 'Begrüßung@work';
    echo 
    isStringValid($string);
    function 
    isStringValid($string){
       
    $pattern "/[\*\]|[,]|[;]|[\/]|[\?]|[\"]|[\']|[:]|[@]|[#]|[$]|[%]|[\^]|[&]|[\*]|[\(]|[\)]|[\-]|[_]|[=]|[\+]|[!]|[\`]|[~]|[\\\\]|[\!]|[\|]|[<]|[>]|[\.][\,]/";
       
    $has_banned_char preg_match($pattern$string);
       if(
    $has_banned_char == 0){
           return 
    1// is valid
       
    } else {
           return 
    0// is not valid
       
    }

    Using the string:
    PHP Code:
    $string 'Deutsch übersetzen scheint zu funktionieren in der deutschen ok, wenn auch nicht sicher, besser versuchen http://translate.google.ca/?hl=en&tab=wT';
    /* Outputs 0 (Is not valid) */ 
    Once I change the string by removing the web address and the commas it showed being valid:
    PHP Code:
    <?php
    $string 
    'Deutsch übersetzen scheint zu funktionieren in der deutschen ok wenn auch nicht sicher besser versuchen';
    I am not sure why ScallioXTX does not work by my expression works... seem a lucky happenstance because I didn't know any better My expression just says match to ONE character in the bracket. If it finds a match it returns a number of matches, which will be greater than 0; therefore 0 means that no matches are found and the string is valid.

    Hope this works for you.

    Steve
    Last edited by ScallioXTX; May 29, 2012 at 11:43.
    ictus==""

  8. #8
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,097
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    Right, I forgot that all my characters has to be stuffed in a character class ...

    Okay, here we go:

    PHP Code:
    function isStringValid($string) {
       
    $pattern "~[*,;/\?\"\':@#\$%\^&\(\)_=+!\]`\~\\\\!\|<>\.-]~";
       return 
    === preg_match($pattern$string);
    }

    $string 'Begrüßung@work';
    var_dump(isStringValid($string)); // false

    $string '\\';
    var_dump(isStringValid($string)); // false

    $string 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 01234567890 Begrüßung';
    var_dump(isStringValid($string)); // true 
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  9. #9
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by ScallioXTX View Post
    Right, I forgot that all my characters has to be stuffed in a character class ...

    Okay, here we go:

    PHP Code:
    function isStringValid($string) {
       
    $pattern "~[*,;/\?\"\':@#\$%\^&\(\)_=+!\]`\~\\\\!\|<>\.-]~";
       return 
    === preg_match($pattern$string);
    }

    $string 'Begrüßung@work';
    var_dump(isStringValid($string)); // false

    $string '\\';
    var_dump(isStringValid($string)); // false

    $string 'abcdefghijklmnopqrstuvwxyz ABCDEFGHIJKLMNOPQRSTUVWXYZ 01234567890 Begrüßung';
    var_dump(isStringValid($string)); // true 
    This is easier to read and less processing. Nice you got it working!

    This
    PHP Code:
    return === preg_match($pattern$string); 
    is better too

    Steve
    ictus==""

  10. #10
    Programming Since 1978 silver trophybronze trophy felgall's Avatar
    Join Date
    Sep 2005
    Location
    Sydney, NSW, Australia
    Posts
    16,868
    Mentioned
    25 Post(s)
    Tagged
    1 Thread(s)
    Wouldn't it just be simpler to use ^[\W\s]+$ to accept letters, numbers and whitespace and just reject everything else. The characters you want to accept will always be far smaller than the thousands of characters you don't want.
    Stephen J Chapman

    javascriptexample.net, Book Reviews, follow me on Twitter
    HTML Help, CSS Help, JavaScript Help, PHP/mySQL Help, blog
    <input name="html5" type="text" required pattern="^$">

  11. #11
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Quote Originally Posted by felgall View Post
    Wouldn't it just be simpler to use ^[\W\s]+$ to accept letters, numbers and whitespace and just reject everything else. The characters you want to accept will always be far smaller than the thousands of characters you don't want.
    Hi Fegall,
    Does the '/^[\W\s]+$/' include all letters including those that would be found in German and/or French? If so, this is even better.

    Regards,
    Steve
    ictus==""

  12. #12
    SitePoint Wizard WorldNews's Avatar
    Join Date
    Nov 2007
    Posts
    1,033
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    1st, what is: ^[\W\s]+$

    2nd, I am waiting for answer to your question whether this will handle German Umlat type characters, and similar non-English French characters.


    Quote Originally Posted by ServerStorm View Post
    Hi Fegall,
    Does the '/^[\W\s]+$/' include all letters including those that would be found in German and/or French? If so, this is even better.
    Regards,
    Steve

    Anoox search engine volunteer

    www.anoox.com

  13. #13
    Keeper of the SFL StarLion's Avatar
    Join Date
    Feb 2006
    Location
    Atlanta, GA, USA
    Posts
    3,748
    Mentioned
    73 Post(s)
    Tagged
    0 Thread(s)
    fel: Trick is he specified he was going to be accepting multiple foreign languages, precluding using a specific locale to identify word characters. I thought the same...

    Incidentally, fel's code will only work if the ENTIRE string is non-word characters. You want a one-and-done, so the 'corrected' string would be simply to match for ~\W~, which you'd then take the inverse-answer of to determine validity. (A "valid" string would NOT match.)
    \W is "Any non-word character". Word characters are defined by your locale settings.
    Never grow up. The instant you do, you lose all ability to imagine great things, for fear of reality crashing in.

  14. #14
    SitePoint Wizard WorldNews's Avatar
    Join Date
    Nov 2007
    Posts
    1,033
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Hi,

    Just wanted to let you all know that after some going back and forth, that I chose this code:

    function isStringValid($string)
    {
    $pattern = "~[*,;/\?\"\':@#\$%\^&\(\)_=+!\]`\~\\\\!\|<>\.-]~";
    return 0 === preg_match($pattern, $string);

    }

    This offers the best compromise in allowing German, French, etc Words while stopping any
    non-Word like chars.

    Cheers

    Quote Originally Posted by ServerStorm View Post
    This is easier to read and less processing. Nice you got it working!
    This
    PHP Code:
    return === preg_match($pattern$string); 
    is better too

    Steve

    Anoox search engine volunteer

    www.anoox.com

  15. #15
    Foozle Reducer ServerStorm's Avatar
    Join Date
    Feb 2005
    Location
    Burlington, Canada
    Posts
    2,699
    Mentioned
    89 Post(s)
    Tagged
    6 Thread(s)
    Glad you found the one that worked best for you. Fegall's way was nice but a little harder to understand for novice so glad you went ScallioXTX's as it was my preference too!

    Regards,
    Steve
    ictus==""


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •