<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Hot PHP UTF-8 tips</title>
	<atom:link href="http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/</link>
	<description></description>
	<pubDate>Fri, 25 Jul 2008 00:49:16 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: erkekjetter</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-733195</link>
		<dc:creator>erkekjetter</dc:creator>
		<pubDate>Mon, 26 May 2008 14:04:30 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-733195</guid>
		<description>You can find a extended unicode upper/lower case mapping table at
http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/nls/rbagsuppertolowermaptable.htm
Might be useful for someone, it certainly was for me.</description>
		<content:encoded><![CDATA[<p>You can find a extended unicode upper/lower case mapping table at<br />
<a href="http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/nls/rbagsuppertolowermaptable.htm" rel="nofollow">http://publib.boulder.ibm.com/infocenter/systems/index.jsp?topic=/nls/rbagsuppertolowermaptable.htm</a><br />
Might be useful for someone, it certainly was for me.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: bietchetlien</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-214514</link>
		<dc:creator>bietchetlien</dc:creator>
		<pubDate>Thu, 29 Mar 2007 18:14:06 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-214514</guid>
		<description>How to explode unicode string? Thanks</description>
		<content:encoded><![CDATA[<p>How to explode unicode string? Thanks</p>]]></content:encoded>
	</item>
	<item>
		<title>By: monul</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-67476</link>
		<dc:creator>monul</dc:creator>
		<pubDate>Fri, 13 Oct 2006 08:16:49 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-67476</guid>
		<description>&lt;a href="http://monul.ru/" rel="nofollow"&gt;monul&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p><a href="http://monul.ru/" rel="nofollow">monul</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: monul</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-67475</link>
		<dc:creator>monul</dc:creator>
		<pubDate>Fri, 13 Oct 2006 08:15:44 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-67475</guid>
		<description>hehe! hacking encodings - eternal php theme!
&lt;a href="http://monul.ru/" rel="nofollow"&gt;&lt;/a&gt;</description>
		<content:encoded><![CDATA[<p>hehe! hacking encodings - eternal php theme!<br />
<a href="http://monul.ru/" rel="nofollow"></a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: links for 2006-08-11 &#187; D.C Life</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-45131</link>
		<dc:creator>links for 2006-08-11 &#187; D.C Life</dc:creator>
		<pubDate>Sat, 12 Aug 2006 13:49:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-45131</guid>
		<description>[...] SitePoint Blogs » Hot PHP UTF-8 tips (tags: read php)   No Tags   .adHeadline {font: bold 10pt Arial; text-decoration: underline; color: blue;} .adText {font: normal 10pt Arial; text-decoration: none; color: black;} [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] SitePoint Blogs » Hot PHP UTF-8 tips (tags: read php)   No Tags   .adHeadline {font: bold 10pt Arial; text-decoration: underline; color: blue;} .adText {font: normal 10pt Arial; text-decoration: none; color: black;} [&#8230;]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Reverse Email Lookup &#187; Reverse Email Lookup - Hot PHP UTF-8 tips</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-45090</link>
		<dc:creator>Reverse Email Lookup &#187; Reverse Email Lookup - Hot PHP UTF-8 tips</dc:creator>
		<pubDate>Sat, 12 Aug 2006 06:15:50 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-45090</guid>
		<description>[...] Hot PHP UTF-8 tipsSitePoint,&#160;Australia&#160;- Aug 10, 2006&#8230; noise about UTF-8, got an email from Marek &#8230; if they match in the lookup array &#8230; of sequences, representing characters, and utf8_from_unicode() does the reverse) ; &#8230; [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] Hot PHP UTF-8 tipsSitePoint,&nbsp;Australia&nbsp;- Aug 10, 2006&#8230; noise about UTF-8, got an email from Marek &#8230; if they match in the lookup array &#8230; of sequences, representing characters, and utf8_from_unicode() does the reverse) ; &#8230; [&#8230;]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: MarekG</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-45050</link>
		<dc:creator>MarekG</dc:creator>
		<pubDate>Fri, 11 Aug 2006 22:16:13 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-45050</guid>
		<description>It is not clear to me why MediaWiki uses for converting case this code:

&lt;code&gt;function uc ( $str, $first = false )&lt;/code&gt; // in file LanguageUtf8.php, mediawiki-1.7.1.tar.gz
...
&lt;code&gt;return preg_replace( "/$x([a-z]&#124;[\\xc0-\\xff][\\x80-\\xbf]*)/e", 
	      "strtr( \"\$1\" , \$wikiUpperChars )", 	      $str );
&lt;/code&gt;

See their lookup table, they have also "a-z=&#62;A-Z" arrays there:
http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Utf8Case.php

This has to be slow. 

Why it should not be enough just:
&lt;blockquote&gt;&lt;code&gt;if (!$first) return strtr ($str, $wikiUpperChars); // ?&lt;/code&gt;&lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<p>It is not clear to me why MediaWiki uses for converting case this code:</p>
<p><code>function uc ( $str, $first = false )</code> // in file LanguageUtf8.php, mediawiki-1.7.1.tar.gz<br />
&#8230;<br />
<code>return preg_replace( "/$x([a-z]|[\\xc0-\\xff][\\x80-\\xbf]*)/e", 
	      "strtr( \"\$1\" , \$wikiUpperChars )", 	      $str );
</code></p>
<p>See their lookup table, they have also &#8220;a-z=&gt;A-Z&#8221; arrays there:<br />
<a href="http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Utf8Case.php" rel="nofollow">http://svn.wikimedia.org/viewvc/mediawiki/trunk/phase3/includes/Utf8Case.php</a></p>
<p>This has to be slow. </p>
<p>Why it should not be enough just:</p>
<blockquote><code>if (!$first) return strtr ($str, $wikiUpperChars); // ?</code></blockquote>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-45029</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Fri, 11 Aug 2006 19:07:18 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-45029</guid>
		<description>&lt;blockquote&gt;
I wonder why they didn`t implement native utf8 support in php5. its so 90ies…
&lt;/blockquote&gt;

It's not a problem you can solve easily plus it's a lot of work. The tipping point was IBM open sourcing ICU - http://en.wikipedia.org/wiki/International_Components_for_Unicode - that saves the work</description>
		<content:encoded><![CDATA[<blockquote><p>
I wonder why they didn`t implement native utf8 support in php5. its so 90ies…
</p></blockquote>
<p>It&#8217;s not a problem you can solve easily plus it&#8217;s a lot of work. The tipping point was IBM open sourcing ICU - <a href="http://en.wikipedia.org/wiki/International_Components_for_Unicode" rel="nofollow">http://en.wikipedia.org/wiki/International_Components_for_Unicode</a> - that saves the work</p>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-45027</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Fri, 11 Aug 2006 19:05:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-45027</guid>
		<description>&lt;blockquote&gt;
It’s almost too much to take in. So again, I praise the day when full i18n support is implemented into PHP and I can find a gutsy host that will upgrade quickly.
&lt;/blockquote&gt;

I think that's a very forgiveable perspective but at the same time, it's worth battling on. PHP6 is going to make the problem easier to manage but I don't think it's going to make the problem magically vanish. What I also worry is whether it may be a mistake to make all strings Unicode with a flick of a php.ini file - there are issues related to security such phishing-type attacks (unicode characters which look almost like normal ASCII characters) - have yet to clarify exactly how PHP6 is going to look like though, so that may be FUD.

The real issue is character encoding is a leaky abstraction - it's very hard to hide it behind APIs.

If there are two key points to getting it in PHP I'd say it's to consider PHP's problem - http://www.phpwact.org/php/i18n/charsets#php_s_problem_with_character_encoding then look closely at the table here: http://en.wikipedia.org/wiki/UTF-8#Description - examine the 0's and 1's it's describing. Eventually it will fall into place.</description>
		<content:encoded><![CDATA[<blockquote><p>
It’s almost too much to take in. So again, I praise the day when full i18n support is implemented into PHP and I can find a gutsy host that will upgrade quickly.
</p></blockquote>
<p>I think that&#8217;s a very forgiveable perspective but at the same time, it&#8217;s worth battling on. PHP6 is going to make the problem easier to manage but I don&#8217;t think it&#8217;s going to make the problem magically vanish. What I also worry is whether it may be a mistake to make all strings Unicode with a flick of a php.ini file - there are issues related to security such phishing-type attacks (unicode characters which look almost like normal ASCII characters) - have yet to clarify exactly how PHP6 is going to look like though, so that may be FUD.</p>
<p>The real issue is character encoding is a leaky abstraction - it&#8217;s very hard to hide it behind APIs.</p>
<p>If there are two key points to getting it in PHP I&#8217;d say it&#8217;s to consider PHP&#8217;s problem - <a href="http://www.phpwact.org/php/i18n/charsets#php_s_problem_with_character_encoding" rel="nofollow">http://www.phpwact.org/php/i18n/charsets#php_s_problem_with_character_encoding</a> then look closely at the table here: <a href="http://en.wikipedia.org/wiki/UTF-8#Description" rel="nofollow">http://en.wikipedia.org/wiki/UTF-8#Description</a> - examine the 0&#8217;s and 1&#8217;s it&#8217;s describing. Eventually it will fall into place.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: defenderz_</title>
		<link>http://www.sitepoint.com/blogs/2006/08/10/hot-php-utf-8-tips/#comment-44935</link>
		<dc:creator>defenderz_</dc:creator>
		<pubDate>Fri, 11 Aug 2006 09:25:38 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1676#comment-44935</guid>
		<description>I wonder why they didn`t implement native utf8 support in php5. its so 90ies...</description>
		<content:encoded><![CDATA[<p>I wonder why they didn`t implement native utf8 support in php5. its so 90ies&#8230;</p>]]></content:encoded>
	</item>
</channel>
</rss>
