SitePoint Marketplace
Buy and sell Websites, templates, domain names, hosting, graphics and more.
Download sample chapters of any of our popular books.
Learn more with SitePoint books
Buy and sell Websites, templates, domain names, hosting, graphics and more.
Download sample chapters of any of our popular books.
August 9th, 2006 at 5:27 pm
Thank you Harry for doing the presentation. Was really superb!
August 9th, 2006 at 6:48 pm
Think Patrice’s tip on UTF-8 validation needs repeating - nice “hack” I hadn’t thought of.
If you want to make sure incoming UTF-8 is valid UTF-8, use iconv to convert it from UTF-8 to UTF-8. You can also potentially use iconv to clean the input.
PHP’s iconv extension raises an error notice if the input and returns only the portion of the input up to the first invalid (non UTF-8) byte it finds. Sadly there doesn’t seem to be a way to put it into “cleaning” mode, so it can only be used for validation. An example;
if ( $input != @iconv("UTF-8", "UTF-8", $input) ) { die("Bad utf-8\n"); }Meanwhile, the command line interface to iconv allows you to enable “cleaning” - iconv silently drops any bad bytes it finds. E.g.
$ iconv -c -f UTF-8 -t UTF-8 some_utf-8_encoded_file.txtAugust 9th, 2006 at 9:49 pm
you can clean it with iconv the following way:
$t = iconv(”UTF-8″,”UTF-8//IGNORE”,$t);
From http://blog.bitflux.ch/archive/2005/01/24/how-to-get-rid-of-invalid-utf-8-characters.html
:)
August 9th, 2006 at 10:57 pm
Alright! That badly needs documenting in fact although now you mention it, it’s documented here: http://www.gnu.org/software/libiconv/documentation/libiconv/iconv_open.3.html (i.e. $ man iconv_open ). Interesting - needs to try that //TRANSLIT flag …
August 9th, 2006 at 11:44 pm
http://www.php.net/manual/en/function.iconv.php
August 10th, 2006 at 12:29 am
OK - I’m blind ;)
August 11th, 2006 at 12:39 am
[...] As a result of all the noise about UTF-8, got an email from Marek Gayer with some very smart tips on handling UTF-8. What follows is a discussion illustrating what happens when you get obsessed with performance and optimizations (be warned—may be boring, depending on your perspective). [...]
August 12th, 2006 at 2:44 am
Putting this in your .htaccess file should fix any UTF-8 errors w/ funny characters and propper displaying of utf-8:
php_value output_buffering on
php_value output_handler mb_output_handler
php_value mbstring.http_output UTF-8
July 29th, 2008 at 3:49 pm
September 4th, 2008 at 6:01 pm
gj
khojhjhjl
iouhihiihihhhih
jhgkjkjkkllliouoiuoiu
iuynjknkjnj,lkjiujhkljkjllj
,mnkbvtfuyghghghghghggghghghghg
kjhkjkjkjkkjkjkjkjkjjkjkjkjkjkjkjkjj
.,m,./.,m/.,m.,m/.,/.,m.,m./,m.,m/.,.,m./
876876876876866667868976666768668667686898886
iojkjkljljljljljljpiuokaaaaaaaaa