<?xml version="1.0" encoding="UTF-8"?><rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	>
<channel>
	<title>Comments on: Character Encodings and Input</title>
	<atom:link href="http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/feed/" rel="self" type="application/rss+xml" />
	<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/</link>
	<description>News, opinion, and fresh thinking for web developers and designers. The official podcast of sitepoint.com.</description>
	<pubDate>Tue, 02 Dec 2008 06:56:59 +0000</pubDate>
	<generator>http://wordpress.org/?v=2.5</generator>
		<item>
		<title>By: kanga</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-661795</link>
		<dc:creator>kanga</dc:creator>
		<pubDate>Wed, 26 Mar 2008 20:04:49 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-661795</guid>
		<description>One of the techniques that works for just about everything regardless of encoding is the following: 
[1] Copy the line with the error from the source file
[2] paste into notepad
[3] copy from notepad
[4] paste into source file and re-upload

notepad will strip the errors.  I learned this on an emergency alert project I was working on: http&lt;strong&gt;:&lt;/strong&gt;//www.kangalert.com</description>
		<content:encoded><![CDATA[<p>One of the techniques that works for just about everything regardless of encoding is the following:<br />
[1] Copy the line with the error from the source file<br />
[2] paste into notepad<br />
[3] copy from notepad<br />
[4] paste into source file and re-upload</p>
<p>notepad will strip the errors.  I learned this on an emergency alert project I was working on: http<strong>:</strong>//www.kangalert.com</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Célio Santana</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-10675</link>
		<dc:creator>Célio Santana</dc:creator>
		<pubDate>Thu, 10 Nov 2005 13:37:16 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-10675</guid>
		<description>I've a lot of problems with encodings i start to use utf-8 and the problems continues, there´s no effective way to get some data from my SQL server and
show to my users. I'm Brazilian and we use a lot of á, é, ç so these characters
aren´t correctly treated by PHP. I don´t know i used mbstrings and doesn´t work
too. 

What should i do?</description>
		<content:encoded><![CDATA[<p>I&#8217;ve a lot of problems with encodings i start to use utf-8 and the problems continues, there´s no effective way to get some data from my SQL server and<br />
show to my users. I&#8217;m Brazilian and we use a lot of á, é, ç so these characters<br />
aren´t correctly treated by PHP. I don´t know i used mbstrings and doesn´t work<br />
too. </p>
<p>What should i do?</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Anonymous</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-10052</link>
		<dc:creator>Anonymous</dc:creator>
		<pubDate>Wed, 19 Oct 2005 12:17:22 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-10052</guid>
		<description>&lt;em&gt;&lt;code&gt;&lt;blockquote&gt;&lt;/blockquote&gt;&lt;/code&gt;&lt;/em&gt;</description>
		<content:encoded><![CDATA[<p><em><code><blockquote></blockquote></code></em></p>]]></content:encoded>
	</item>
	<item>
		<title>By: Mike</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2071</link>
		<dc:creator>Mike</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2071</guid>
		<description>&lt;p&gt;Huh? Never heard of iconv?&lt;br /&gt;
&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>Huh? Never heard of iconv?</p>]]></content:encoded>
	</item>
	<item>
		<title>By: kaklz</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2072</link>
		<dc:creator>kaklz</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2072</guid>
		<description>&lt;p&gt;The answer is just plain simple - use UTF-8 and forget about the encodings. &lt;br /&gt;
There are countries out there, where you have to use more than one character set, even three of them or more. &lt;br /&gt;
One of them is my country, Latvia, where you have to work with latin, cyrillic and baltic character sets. &lt;br /&gt;
As soon as I started working with UTF-8, I don't have to care about the character sets anymore. So if you are using any other character set than latin, I would suggest you to move on to UTF-8.&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>The answer is just plain simple - use UTF-8 and forget about the encodings. <br />
There are countries out there, where you have to use more than one character set, even three of them or more. <br />
One of them is my country, Latvia, where you have to work with latin, cyrillic and baltic character sets. <br />
As soon as I started working with UTF-8, I don&#8217;t have to care about the character sets anymore. So if you are using any other character set than latin, I would suggest you to move on to UTF-8.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: BerislavLopac</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2073</link>
		<dc:creator>BerislavLopac</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2073</guid>
		<description>&lt;p&gt;I agree with kaklz above. Browsers use encoding specified in header to properly display HTML output and convert form entries, and databases store whatever comes from PHP -- there is no need to convert strings internally. Even if you need to replace multibyte strings with other values, sprintf works like charm.&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>I agree with kaklz above. Browsers use encoding specified in header to properly display HTML output and convert form entries, and databases store whatever comes from PHP &#8212; there is no need to convert strings internally. Even if you need to replace multibyte strings with other values, sprintf works like charm.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: HarryF</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2074</link>
		<dc:creator>HarryF</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2074</guid>
		<description>&lt;p&gt;FYI, the WACT team is slowly putting together some resources on i18n;&lt;/p&gt;

&lt;p&gt;http://wact.sourceforge.net/docs/doku.php?id=php:i18n&lt;/p&gt;

&lt;p&gt;Also something on charsets, with the emphasis on UTF-8:&lt;br /&gt;
http://wact.sourceforge.net/docs/doku.php?id=php:i18n:charsets&lt;/p&gt;

&lt;p&gt;Not complete yet and needs some heavy revision - very much in "note form" right now but getting there&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>FYI, the WACT team is slowly putting together some resources on i18n;</p>
<p><a href="http://wact.sourceforge.net/docs/doku.php?id=php:i18n" rel="nofollow">http://wact.sourceforge.net/docs/doku.php?id=php:i18n</a></p>
<p>Also something on charsets, with the emphasis on UTF-8:<br />
<a href="http://wact.sourceforge.net/docs/doku.php?id=php:i18n:charsets" rel="nofollow">http://wact.sourceforge.net/docs/doku.php?id=php:i18n:charsets</a></p>
<p>Not complete yet and needs some heavy revision - very much in &#8220;note form&#8221; right now but getting there</p>]]></content:encoded>
	</item>
	<item>
		<title>By: Glen</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2075</link>
		<dc:creator>Glen</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2075</guid>
		<description>&lt;p&gt;I have found that most encoding problems can easily be by-passed by using UTF-8 encoding for all script generated HTML.  i.e. In your &lt;head&gt;, include the following:&lt;/p&gt;

&lt;p&gt;&lt;meta http-equiv="content-type" content="text/html; charset=utf-8" /&gt;&lt;/p&gt;

&lt;p&gt;That way, the client will render UTF-8 correctly, and as a bonus will send back UTF-8 data from forms.&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>I have found that most encoding problems can easily be by-passed by using UTF-8 encoding for all script generated HTML.  i.e. In your <head>, include the following:</head></p>
<p><meta http-equiv="content-type" content="text/html; charset=utf-8" /></p>
<p>That way, the client will render UTF-8 correctly, and as a bonus will send back UTF-8 data from forms.</p>]]></content:encoded>
	</item>
	<item>
		<title>By: mmj</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2076</link>
		<dc:creator>mmj</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2076</guid>
		<description>&lt;p&gt;&lt;blockquote&gt;&lt;p&gt;That way, the client will render UTF-8 correctly, and as a bonus will send back UTF-8 data from forms.&lt;/p&gt;&lt;/blockquote&gt;&lt;br /&gt;
Unfortunately, for many reasons, ranging from browser bugs to browsers blatantly ignoring the spec to users changing the character encoding setting in their browser, you cannot rely on submitted data to be in any particular character encoding.  If you are expecting UTF-8 encoding and you receive something else, that something else may just break your CMS, or at least ensure that it will not validate as HTML.&lt;/p&gt;

&lt;p&gt;Thus you will arrive at the very problem described in this blog post.&lt;/p&gt;

&lt;p&gt;This article contains more information about why you can't rely on input to be in any character encoding.&lt;br /&gt;
http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>
<blockquote>
<p>That way, the client will render UTF-8 correctly, and as a bonus will send back UTF-8 data from forms.</p>
</blockquote>
</p><p>
Unfortunately, for many reasons, ranging from browser bugs to browsers blatantly ignoring the spec to users changing the character encoding setting in their browser, you cannot rely on submitted data to be in any particular character encoding.  If you are expecting UTF-8 encoding and you receive something else, that something else may just break your CMS, or at least ensure that it will not validate as HTML.</p>
<p>Thus you will arrive at the very problem described in this blog post.</p>
<p>This article contains more information about why you can&#8217;t rely on input to be in any character encoding.<br />
<a href="http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html" rel="nofollow">http://ppewww.ph.gla.ac.uk/~flavell/charset/form-i18n.html</a></p>]]></content:encoded>
	</item>
	<item>
		<title>By: mmj</title>
		<link>http://www.sitepoint.com/blogs/2005/04/19/character-encodings-and-input/#comment-2077</link>
		<dc:creator>mmj</dc:creator>
		<pubDate>Wed, 31 Dec 1969 19:00:00 +0000</pubDate>
		<guid isPermaLink="false">864366962#comment-2077</guid>
		<description>&lt;p&gt;&lt;blockquote&gt;&lt;p&gt;Huh? Never heard of iconv?&lt;/p&gt;&lt;/blockquote&gt;&lt;br /&gt;
Hi Mike,&lt;br /&gt;
I lumped iconv in with other third party libraries when I said "another third-party library".  There is a PHP extension, however, which interfaces with it.  The extension is an alternative to mbstring.  Like mbstring, it is disabled by default in PHP and must be explictly enabled.  Unfortunately, not all PHP users are able to take advantage of these extensions as they might not be able to compile or configure PHP themselves.  However, if you can, then it would be a good solution.&lt;/p&gt;

</description>
		<content:encoded><![CDATA[<p>
<blockquote>
<p>Huh? Never heard of iconv?</p>
</blockquote>
</p><p>
Hi Mike,<br />
I lumped iconv in with other third party libraries when I said &#8220;another third-party library&#8221;.  There is a PHP extension, however, which interfaces with it.  The extension is an alternative to mbstring.  Like mbstring, it is disabled by default in PHP and must be explictly enabled.  Unfortunately, not all PHP users are able to take advantage of these extensions as they might not be able to compile or configure PHP themselves.  However, if you can, then it would be a good solution.</p>]]></content:encoded>
	</item>
</channel>
</rss>
