If you haven't already read it, check the
WACT unicode notes. There are some good extra unicode functions in
docuwiki and I had a skim through
this project when I was messing around with UTF-8.
With regard to the functions listed..
mysql_real_escape_string will work OK, as long as you have set the DB connection encoding to UTF-8.
trim is OK too, as long as you don't pass in unicode characters to remove (i.e. ok with whitespace and newlines. You can write your own mb_*trim replacements (but these will be slower):
PHP Code:
/**
* Unicode aware replacement for ltrim.
*
* Trimming can corrupt a Unicode string by replacing single bytes from a
* multi-byte sequence. Used in a default manner, ltrim is UTF-8 safe, but
* with the optional charlist variable specified it can corrupt strings.
*
* @see ltrim
* @param string $str string to trim
* @param string $charlist list of characters to trim
* @return string trimmed string
*/
function mb_ltrim($str,$charlist='')
{
if (strlen($charlist)==0) {
return ltrim($str);
} else {
$charlist = preg_quote($charlist,'#');
return preg_replace('#^['.$charlist.']+#u','',$str);
}
}
/**
* Unicode aware replacement for rtrim.
*
* @see rtrim
* @param string $str string to trim
* @param string $charlist list of characters to trim
* @return string trimmed string
*/
function mb_rtrim($str,$charlist='')
{
if (strlen($charlist)==0) {
return rtrim($str);
} else {
$charlist = preg_quote($charlist,'#');
return preg_replace('#['.$charlist.']+$#u','',$str);
}
}
/**
* Unicode aware replacement for trim.
*
* @see trim
* @param string $str string to trim
* @param string $charlist list of characters to trim
* @return string trimmed string
*/
function mb_trim($str,$charlist='')
{
if (strlen($charlist)==0) {
return trim($str);
} else {
return mb_ltrim(mb_rtrim($str,$charlist),$charlist);
}
}
wordwrap and nl2br will be OK I think, as spaces and line breaks are unique within UTF-8.
strstr you can use the mbString replacement,
mb_strstr.
Bookmarks