SitePoint Sponsor

User Tag List

Results 1 to 3 of 3
  1. #1
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)

    Localisation poser : how to update my regex?

    Here we go...

    I have a few stalwart regexes I have relied similar to:
    PHP Code:
    $str "122 bc"// letters or numbers space only
    var_dumppreg_match("#^[a-z0-9 ]{3,20}$#i"$str) ) ;
    //returns 1 as all is good 
    So now I am handling funny french characters like this, and its job stopped.
    PHP Code:
    $str "12é bc"// letters or numbers space only
    var_dumppreg_match("#^[a-z0-9 ]{3,20}$#i"$str) ) ;
    //returns 0 - all is not good 
    The encoding is utf-8 and the languages will be western european, so how do I do the equivalent of that regex i.e. how to filter in chars like é ?

    I read this info that kyber kindly pointed me to, but the little pea in my head has stopped rolling round now I cannot clearly see how I check for a range of permitted chars in multi-byte character sets.

    Starter for 10 for anyone?

  2. #2
    @php.net Salathe's Avatar
    Join Date
    Dec 2004
    Location
    Edinburgh
    Posts
    1,396
    Mentioned
    54 Post(s)
    Tagged
    0 Thread(s)
    Look at using the \p{xx} escape sequences (manual). E.g. in your case, /^[\pL\d ]{3,20}$/D

    Bear in mind that the above will allow any "letter character", including non–western-european characters!
    Salathe
    Software Developer and PHP Manual Author.

  3. #3
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    $french = '123 Rué Oçà ?';
    var_dump( preg_match('/^[\pL\d ]{3,20}$/Du', $french ) );

    Thanks salathe, that works as expected now, I am so happy I asked.

    So, add the u switch for unicode, and instead of checking for a-z (which I only used out of laziness because I can never remember the Base Character Classes, and sometimes the escape char disappears when I post on here) I have to use the fairly recent utf-8 specific \p escape sequences.

    Thanks again.


Tags for this Thread

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •