<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Scripters UTF-8 Survival Guide (slides)</title>
	<atom:link href="http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/</link>
	<description></description>
	<pubDate>Sat, 05 Jul 2008 04:51:28 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: daniel</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-45000</link>
		<dc:creator>daniel</dc:creator>
		<pubDate>Fri, 11 Aug 2006 16:44:02 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-45000</guid>
		<description>Putting this in your .htaccess file should fix any UTF-8 errors w/ funny characters and propper displaying of utf-8:

php_value output_buffering		on
php_value output_handler		mb_output_handler
php_value mbstring.http_output		UTF-8</description>
		<content:encoded><![CDATA[<p>Putting this in your .htaccess file should fix any UTF-8 errors w/ funny characters and propper displaying of utf-8:</p>
<p>php_value output_buffering		on<br />
php_value output_handler		mb_output_handler<br />
php_value mbstring.http_output		UTF-8</p>]]></content:encoded>
	</item>
	<item>
		<title>By: SitePoint Blogs &#187; Hot PHP UTF-8 tips</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-44559</link>
		<dc:creator>SitePoint Blogs &#187; Hot PHP UTF-8 tips</dc:creator>
		<pubDate>Thu, 10 Aug 2006 14:39:03 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-44559</guid>
		<description>[...] As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned&#8212;may be boring, depending on your perspective). [...]</description>
		<content:encoded><![CDATA[<p>[&#8230;] As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned&#8212;may be boring, depending on your perspective). [&#8230;]</p>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-44023</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Wed, 09 Aug 2006 14:29:01 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-44023</guid>
		<description>OK - I'm blind ;)</description>
		<content:encoded><![CDATA[<p>OK - I&#8217;m blind ;)</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Sorccu</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-44020</link>
		<dc:creator>Sorccu</dc:creator>
		<pubDate>Wed, 09 Aug 2006 13:44:43 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-44020</guid>
		<description>&lt;blockquote&gt;
Alright! That badly needs documenting in fact although now you mention it ..
&lt;/blockquote&gt;

http://www.php.net/manual/en/function.iconv.php

&lt;blockquote&gt;
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can't be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
&lt;/blockquote&gt;</description>
		<content:encoded><![CDATA[<blockquote><p>
Alright! That badly needs documenting in fact although now you mention it ..
</p></blockquote>
<p><a href="http://www.php.net/manual/en/function.iconv.php" rel="nofollow">http://www.php.net/manual/en/function.iconv.php</a></p>
<blockquote><p>
If you append the string //TRANSLIT to out_charset transliteration is activated. This means that when a character can&#8217;t be represented in the target charset, it can be approximated through one or several similarly looking characters. If you append the string //IGNORE, characters that cannot be represented in the target charset are silently discarded. Otherwise, str is cut from the first illegal character.
</p></blockquote>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-44006</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Wed, 09 Aug 2006 12:57:00 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-44006</guid>
		<description>&lt;blockquote&gt;
you can clean it with iconv the following way:

$t = iconv("UTF-8″,"UTF-8//IGNORE",$t);
&lt;/blockquote&gt;

Alright! That badly needs documenting in fact although now you mention it, it's documented here: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html (i.e. $ man iconv_open ). Interesting - needs to try that //TRANSLIT flag ...</description>
		<content:encoded><![CDATA[<blockquote><p>
you can clean it with iconv the following way:</p>
<p>$t = iconv(&#8221;UTF-8″,&#8221;UTF-8//IGNORE&#8221;,$t);
</p></blockquote>
<p>Alright! That badly needs documenting in fact although now you mention it, it&#8217;s documented here: <a href="http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html" rel="nofollow">http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html</a> (i.e. $ man iconv_open ). Interesting - needs to try that //TRANSLIT flag &#8230;</p>]]></content:encoded>
	</item>
	<item>
		<title>By: chregu</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-43992</link>
		<dc:creator>chregu</dc:creator>
		<pubDate>Wed, 09 Aug 2006 11:49:29 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-43992</guid>
		<description>you can clean it with iconv the following way:

 $t = iconv("UTF-8","UTF-8//IGNORE",$t);

From  http://blog.bitflux.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html

:)</description>
		<content:encoded><![CDATA[<p>you can clean it with iconv the following way:</p>
<p> $t = iconv(&#8221;UTF-8&#8243;,&#8221;UTF-8//IGNORE&#8221;,$t);</p>
<p>From  <a href="http://blog.bitflux.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html" rel="nofollow">http://blog.bitflux.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html</a></p>
<p>:)</p>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-43925</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Wed, 09 Aug 2006 08:48:53 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-43925</guid>
		<description>Think &lt;a href="http://weblog.patrice.ch/" rel="nofollow"&gt;Patrice's&lt;/a&gt; tip on UTF-8 validation needs repeating - nice "hack" I hadn't thought of.

If you want to make sure incoming UTF-8 is valid UTF-8, use &lt;a href="http://en.wikipedia.org/wiki/Iconv" rel="nofollow"&gt;iconv&lt;/a&gt; to convert it from UTF-8 to UTF-8. You can also potentially use iconv to clean the input.

PHP's iconv extension raises an error notice if the input and returns only the portion of the input up to the first invalid (non UTF-8) byte it finds. Sadly there doesn't seem to be a way to put it into "cleaning" mode, so it can only be used for validation. An example;

&lt;code&gt;
if ( $input != @iconv("UTF-8", "UTF-8", $input) ) {
    die("Bad utf-8\n");
}
&lt;/code&gt;

Meanwhile, the command line interface to iconv allows you to enable "cleaning" - iconv silently drops any bad bytes it finds. E.g.

&lt;code&gt;
$ iconv -c -f UTF-8 -t UTF-8 some_utf-8_encoded_file.txt
&lt;/code&gt;</description>
		<content:encoded><![CDATA[<p>Think <a href="http://weblog.patrice.ch/" rel="nofollow">Patrice&#8217;s</a> tip on UTF-8 validation needs repeating - nice &#8220;hack&#8221; I hadn&#8217;t thought of.</p>
<p>If you want to make sure incoming UTF-8 is valid UTF-8, use <a href="http://en.wikipedia.org/wiki/Iconv" rel="nofollow">iconv</a> to convert it from UTF-8 to UTF-8. You can also potentially use iconv to clean the input.</p>
<p>PHP&#8217;s iconv extension raises an error notice if the input and returns only the portion of the input up to the first invalid (non UTF-8) byte it finds. Sadly there doesn&#8217;t seem to be a way to put it into &#8220;cleaning&#8221; mode, so it can only be used for validation. An example;</p>
<code>
if ( $input != @iconv("UTF-8", "UTF-8", $input) ) {
    die("Bad utf-8\n");
}
</code>
<p>Meanwhile, the command line interface to iconv allows you to enable &#8220;cleaning&#8221; - iconv silently drops any bad bytes it finds. E.g.</p>
<code>
$ iconv -c -f UTF-8 -t UTF-8 some_utf-8_encoded_file.txt
</code>]]></content:encoded>
	</item>
	<item>
		<title>By: Patrice</title>
		<link>http://www.sitepoint.com/blogs/2006/08/09/scripters-utf-8-survival-guide-slides/#comment-43907</link>
		<dc:creator>Patrice</dc:creator>
		<pubDate>Wed, 09 Aug 2006 07:27:24 +0000</pubDate>
		<guid isPermaLink="false">http://www.sitepoint.com/blogs/?p=1672#comment-43907</guid>
		<description>Thank &lt;strong&gt;you&lt;/strong&gt; Harry for doing the presentation. Was really superb!</description>
		<content:encoded><![CDATA[<p>Thank <strong>you</strong> Harry for doing the presentation. Was really superb!</p>]]></content:encoded>
	</item>
</channel>
</rss>
