SitePoint Sponsor

User Tag List

Results 1 to 8 of 8
  1. #1
    SitePoint Member MarketJunction's Avatar
    Join Date
    Mar 2005
    Posts
    0
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Angry Help please. Darn foreign chars.

    Hi,

    I have a string with foreign chars in it. Looks like:

    ------
    Ask many of his listeners what they think about his sermons and they’ll quickly respond with only words of acclamation. Follow that questions with a request for what the sermon was about and you’re met with only blank stares.
    ------

    I have tried everything under the sun to remove the foreign chars, but no luck. The latest thing I tried was this function--and it does nothing.

    PHP Code:
    function unaccent($text) {
      static 
    $search$replace;
      if (!
    $search) {
        
    $search $replace = array();
        
    // Get the HTML entities table into an array
        
    $trans get_html_translation_table(HTML_ENTITIES);
        
    // Go through the entity mappings one-by-one
        
    foreach ($trans as $literal => $entity) {
          
    // Make sure we don't process any other characters such as fractions, quotes etc:
          
    if (ord($literal) >= 192) {
            
    // Get the accented form of the letter
            
    $search[] = $literal;
            
    // Get e.g. 'E' from the string '&Eacute'
            
    $replace[] = "";//$entity[1];
          
    }
        }
      }
      return 
    str_replace($search$replace$text);

    I don't know what to do now.

    Any help appreciated.

    Thanks
    Need gambling content? [Gambling Writer]
    Add Your Gambling Sites >> Gambling Website Directory

  2. #2
    Massimiliano Bruno Giordano sid egg's Avatar
    Join Date
    Aug 2004
    Location
    Canada
    Posts
    1,280
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    echo ord("the character")

    of each character you want to replace.
    then check if ord('the character') = that number
    if it does, replace it with a predefined replacement.


    It's a bit of a hack, I'm sure there's a better way.
    GamesLib.com - the slickest, most complete and
    easily navigatible flash games site on the web.

  3. #3
    SitePoint Member MarketJunction's Avatar
    Join Date
    Mar 2005
    Posts
    0
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    What number are you referring to?

    The string is about 4000 chars long. I tried looping per char and making them all match a-z 0-9 or a couple punctuations, but that failed.
    Need gambling content? [Gambling Writer]
    Add Your Gambling Sites >> Gambling Website Directory

  4. #4
    SitePoint Guru aamonkey's Avatar
    Join Date
    Sep 2004
    Location
    kansas
    Posts
    953
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    here's something to try:
    PHP Code:
     $str '~`!@#$%^&*()_+}|{":\?><,./\';\][=-and they’ll quickly respond with only words of acclamation. Follow that questions with a request for what the sermon was about and you’re';
      
      
    $allow_chars '~`!@#$%^&*\(\)_\-\+=|\}\{\[\]"\'\:;\?\/><\,\.\\\\';
      
      
    $new_str preg_replace("/[^\w\s" $allow_chars "]+/"""$str);
      
      echo 
    $new_str
    just a side note--i was having a somewhat similar problem awhile back with people copying and pasting from microsoft word into a text input in a form on my site, which in turn got stored in the database. My validation did not include checking for these special unicode characters. Once in the (mysql) database, I tried everything I could to convert the special characters to display properly, but for some reason once they were stored in the table text column, the characters did not match up to any extended ASCII tables, and even doing a str_replace() COPYING AND PASTING the offending characters did not work. The only solution I could find (and which I should have done in the first place) was to get rid of or convert these characters BEFORE they reached the database.

    Of course, if you're not storing this text in a DB, then ignore the last paragraph

  5. #5
    SitePoint Member MarketJunction's Avatar
    Join Date
    Mar 2005
    Posts
    0
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yea, I am storing in Mysql.

    So you are saying that I should run a check on my post form and convert first? I was not aware it mattered either way.

    In your example would [=-and they blah blah

    be the same as:

    [=-$string
    Need gambling content? [Gambling Writer]
    Add Your Gambling Sites >> Gambling Website Directory

  6. #6
    SitePoint Guru aamonkey's Avatar
    Join Date
    Sep 2004
    Location
    kansas
    Posts
    953
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by MarketJunction
    So you are saying that I should run a check on my post form and convert first? I was not aware it mattered either way.
    I wasn't aware of that either, and I still have not found any documentation on it...all I know is that I was able to replace the characters before they were entered in the database, but not after.

    Quote Originally Posted by MarketJunction
    In your example would [=-and they blah blah

    be the same as:

    [=-$string
    You can ignore that whole $str=......
    I just put all the special characters in there so to illustrate that they would be left alone while the non-ASCII characters would be deleted. Just add or delete any special characters from the $allow_chars variable that you do/don't want to accept. Note that all Perl regular expression special characters are excaped with a backslash.

  7. #7
    SitePoint Member MarketJunction's Avatar
    Join Date
    Mar 2005
    Posts
    0
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Ok thanks. Think I will try alter before and screw everything else I have. Too many hours wasted.

    I was able to get out word stuff, but the foreign chars won't budge.

    Oh well.
    Need gambling content? [Gambling Writer]
    Add Your Gambling Sites >> Gambling Website Directory

  8. #8
    SitePoint Wizard
    Join Date
    Jan 2004
    Location
    3rd rock from the sun
    Posts
    1,005
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    @seamonkey

    Ditto - on the lost sanity and time. Came to the same conclusion as you.

    theres a nice function strictify() on the user notes in the man on www.php.net/chr

    That deals with most Word bad chars like curly quotes - but theres lots more really...


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •