SitePoint Sponsor

User Tag List

Page 1 of 3 123 LastLast
Results 1 to 25 of 54

Hybrid View

  1. #1
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question character encoding issues

    Hi guys I have a rating system that won't let users rate items which contain the following character within their string name:

    é, ', ñ, ú, ó é

  2. #2
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,067
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    What charter encoding (UTF-8, UTF-16, ISO-8859-1, etc) do you use for your website and database?
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  3. #3
    SitePoint Wizard bronze trophy Kailash Badu's Avatar
    Join Date
    Nov 2005
    Posts
    2,560
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think it's the case of conflicting character encoding ! ISO-8859-1 being processed/displayed as UTF-8 or vice versa. compare the character encoding of the page in which a user rates items to the page in which the submitted data is displayed.

  4. #4
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thank you guys for your reply

    I am using ISO-8859-1 for the website but for the database i don't know where to find or change the character encoding once i am in phpmyadmin, I think the database is using utf-8.

    When I am going to create a database in phpmyadim this appear


    MySQL
    Server: localhost (localhost via TCP/IP)
    Server version: 5.1.33-community-log
    Protocol version: 10
    User: root@localhost
    MySQL charset: UTF-8 Unicode (utf8)
    The question would be where can I change the character encoding of the database.

  5. #5
    SitePoint Wizard bronze trophy Kailash Badu's Avatar
    Join Date
    Nov 2005
    Posts
    2,560
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I think you will save yourself a lot of trouble if you change the encoding of your web pages to UTF-8 instead.

  6. #6
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    The problem is that when I change the meta from charset=ISO-8859-1" to "utf-8" then the characters above don't appear.

    For instance instead of Pesuñas it appear Pesuas without the "ñ"


    <meta http-equiv="Content-Type" content="text/html; charset=utf-8">

    the actual charset set up for the entire website is

    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    <html>
    <head>
    <link type="text/css" href="../stylesheets/shoeswebpageprueba.css" rel="stylesheet" media="all" />
    <title>NYhungry</title>

    <script type="text/javascript" src="scripts/prototype.js"></script>
    <script type="text/javascript" src="scripts/rating.js"></script>
    <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.3/jquery.min.js"></script>


    <script>
    $(function()
    {
    var default_image = $('td.largethumb img').attr('src');
    $('table.smallthumbs a').mouseover(function() { $('td.largethumb img').attr('src', $('img', this).attr('src')); });
    });
    </script>


    </head>
    Notice that the <meta> is outside the html tags

    Thank you...

    How can I change the meta from charset=ISO-8859-1" to "utf-8" and still having able to have those character displaying without disappearing?

  7. #7
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    can anybody refer some help please?

  8. #8
    Utopia, Inc. silver trophy
    ScallioXTX's Avatar
    Join Date
    Aug 2008
    Location
    The Netherlands
    Posts
    9,067
    Mentioned
    153 Post(s)
    Tagged
    2 Thread(s)
    You need to use html entities. For PHP use the function htmlentities()

    What is does is write for example é to &eacute;

    é cannot be displayed by UTF-8. &eacute however can be displayed (and renders like é).
    Rémon - Hosting Advisor

    SitePoint forums will switch to Discourse soon! Make sure you're ready for it!

    Minimal Bookmarks Tree
    My Google Chrome extension: browsing bookmarks made easy

  9. #9
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is very strange the ISO-8859-1 does display it..

  10. #10
    #titanic {float:none} silver trophy
    molona's Avatar
    Join Date
    Feb 2005
    Location
    from Madrid to Heaven
    Posts
    8,220
    Mentioned
    237 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by co.ador View Post
    The problem is that when I change the meta from charset=ISO-8859-1" to "utf-8" then the characters above don't appear.

    For instance instead of Pesu&#241;as it appear Pesuas without the "&#241;"
    You mean "pezu&#241;as", don't you? "pesu&#241;as" is a word that I don't recognise... although South Americans may change the "z" for a "s" because they don't use that sound.

    Quote Originally Posted by co.ador View Post
    Notice that the <meta> is outside the html tags

    Thank you...
    It is not going to be any change but ... is there any particular reason that it is outside the HTML tags? Because it shouldn't be

  11. #11
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2008
    Posts
    5,757
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    utf-8 webpages can display &#233; without needing to use entities. Your text is probably not encoded in utf-8 if you're having a problem. Simply telling the browser a character set doesn't change any of your text.

    Save the file as utf-8 in your text editor
    PHP Code:
    <?php
    header
    ('Content-type: text/html; charset=utf-8');
    echo 
    'é';
    ?>

  12. #12
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It does echo the character which means thank utf-8 does echo encoding characters but the issue is thank I wrote the characters directly in the database . I want to know how can I change the character encoding.

    I forgot to save it as utf-8 in the text editor I have saved as php

  13. #13
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When you says text editor you mean notpad, wordpad or dreamweaver.

    Because I don't see the extension of utf-8 in none of then

  14. #14
    SitePoint Enthusiast
    Join Date
    Nov 2006
    Posts
    50
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    If you are wanting to support lots of strange character like these, you should have a read around UTF-8, beacuse all sorts of funny things can happen:

    http://www.phpwact.org/php/i18n/utf-8
    http://dev.splitbrain.org/view/darcs...i/inc/utf8.php
    http://sourceforge.net/projects/phputf8

    If you want to be safe, you need to:
    * have your db in UTF-8 (and *connect* to it in utf-8)
    * send a UTF-8 encoding header
    * use UTF-8-safe string functions
    * make sure your editor is saving in utf-8
    * ... and only then modify the utf-8 charset in the meta-tag

  15. #15
    SitePoint Zealot boen_robot's Avatar
    Join Date
    Apr 2008
    Location
    europe://Bulgaria/Plovdiv
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Some pointers on the above advises (which I fully second):
    * Have your DB in UTF-8. - You already appear to have that, but if you want to make sure - when you're viewing the list of your DB's, check the "collation" colum of your DB. It should be "utf8_general_ci".
    * Connect to the DB in UTF-8 - varies depending on your DB API of choise. With the MySQL extension, that would be the mysql_set_charset() funciton.
    * Send a UTF-8 encoding HTTP header - crmalibu already showed you how to do that - simply add
    Code PHP:
    header('Content-type: text/html; charset=utf-8');
    before you output anything
    * Use UTF-8 safe string functions - those are the Multybyte string functions and iconv functions. They require special extensions though. The standard ones would work, but may give misleading results when the string contains exotic characters. AFAIK, PHP6 will bring unicode (and therefore UTF-8) support out of the box, so that even the standard functions would work.
    * Make sure your editor is saving files (PHP or anything else) in UTF-8. In Notepad for instance, when you click "Save As...", there should be a menu called "Encoding", right below the one where you select "All Files". Make sure the value in there is "UTF-8", and if not, change it to that, and save the file with whatever its extension is (.php, etc.).
    * Add the Content-Type meta equivalent
    Code XML:
    <meta http-equiv="Content-Type" content="text/html; charset=utf-8" />
    and do add it in your head, will you?

    Note: "extension" is different than "encoding". You won't find "the extension of utf-8" in any editor. Look for "the encoding of utf-8".
    XML_XSLT2Processor - perform XSLT 2.0 transformations in PHP.
    (my library, all feedback welcome)

  16. #16
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I have already put the DB in "utf8-general_ci"


    Now the second step would be to connect the DB in utf8 using mysql_set_charset() function as boen_robot and robt has point out.

    Well the set up below is what I use for the database connection set up

    constant.php file:
    PHP Code:
    <?php

    // Database Constants
    defined('DB_SERVER') ? null define("DB_SERVER""localhost");
    defined('DB_USER')   ? null define("DB_USER""root");
    defined('DB_PASS')   ? null define("DB_PASS""superwork");
    defined('DB_NAME')   ? null define("DB_NAME""shoes");

    ?>

    connection.php file:

    PHP Code:
    <?php require("../includes/constant.php");
     
    $connection mysql_connect(DB_SERVER,DB_USER,DB_PASS);
    if(!
    $connection){
    die(
    "Database connection failed:" mysql_error());
    }
    $db_select mysql_select_db(DB_NAME$connection);
    if(!
    $db_select){
    die(
    "Database selection failed: " mysql_error());
    }
    ?>
    Where would be the best spot to place the mysql_set_charset() function within those two files above?

    Thank you guys for outlining the steps...

  17. #17
    SitePoint Member FastLionDesign's Avatar
    Join Date
    Nov 2007
    Location
    Philadelphia, PA USA
    Posts
    21
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    co.ador:

    If you use Windows, then try Notepad++ to write your code. You can read about Notepad++ and download it at:

    http://download.cnet.com/Notepad/300...html?tag=mncol

    Or go to the Notepad++ homepage:

    http://notepad-plus.sourceforge.net/uk/site.htm

    In Notepad++ you can save your pages as "utf-8" or "utf-8 without BOM." As I understand it, "utf-8 without BOM" is the preferred way to save your pages. Click "Format" in Notepad++ to see the many options.

    George
    Fast Lion Design
    www.fastliondesign.com

  18. #18
    SitePoint Zealot boen_robot's Avatar
    Join Date
    Apr 2008
    Location
    europe://Bulgaria/Plovdiv
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    @co.ador
    Put the mysql_set_charset() call right above "$db_select". To make it as fool proofed as what you currently have, you could use it like:
    Code PHP:
    if (!mysql_set_charset('utf-8', $connection)) {
    die('Setting database encoding failed:' . mysql_error());
    }
    @FastLionDesign
    Oh please don't go there! You're only raising up the question "what is BOM?". Saying that Notepad++ can save files as UTF-8 was enough info.

    BTW, I have to say, I really dislike it when I see people mentioning something without giving any reference to what they're saying (cases in mind - BOM and the mysql_set_charset() function from this topic). If people knew those references themselves, they wouldn't be asking the question to begin with.
    XML_XSLT2Processor - perform XSLT 2.0 transformations in PHP.
    (my library, all feedback welcome)

  19. #19
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Pesuñas I did a miss spelling on that...

    Than you guys for your advice and input,
    1- step one have the database in UTF-8 -done-

    2- step two connect to the database in utf-8 just like you suggested boen_robot -done- but now sure if I have placed the mysql_set_charset function in it's place

    PHP Code:
    <?php require("../includes/constant.php");
     
    $connection mysql_connect(DB_SERVER,DB_USER,DB_PASS);
    if(!
    $connection){
    die(
    "Database connection failed:" mysql_error());
    }
    if (!
    mysql_set_charset('utf-8'$connection)) {
    die(
    'Setting database encoding failed:' mysql_error());
    }
    $db_select mysql_select_db(DB_NAME$connection);
    if(!
    $db_select){
    die(
    "Database selection failed: " mysql_error());
    }
    ?>
    3 step three Send a UTF-8 encoding HTTP header just like malibu and boen_robot suggested -done-

    4 Step four Use UTF-8 safe string functions Now I don't know where to use in order to obtain the desired results.

    Results such as the bunch of strange characters which won't let a rating system to work or rate some item names which contain some of this strange character. How should I use those functions. One thing is that most of the strange character were written directly in the database... it is a bunch of field which repeats and display the same field but with differnet shoes names, prices and details. But in this case the rating system won't rate those items that contain characters one or more of those strange character in their name field.
    So far we have gotten onto step three has, but step 4,5 and 6 miss.

    So far onto step three it won't display the rest of the content after I have placed the mysql_set_charset function. It just display the header.


    I have download it notepad which is the following step but it doesn't have the menu called "ENCODING" below all files. I have downloaded the version

    Notepad++ v5.3.1 (UNICODE)

    Maybe there is another version of notepad++ which has the ENCODING menu. in the Official notepad website there is several version of the program. Does anybody has any suggestion on which to download?

  20. #20
    SitePoint Zealot boen_robot's Avatar
    Join Date
    Apr 2008
    Location
    europe://Bulgaria/Plovdiv
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    When I said "Notepad", I meant "Notepad", not "Notepad++". And by "Notepad", I mean the one you have built into Windows. You know, the one in "Start > (All) Programs > Accessories > Notepad". I haven't worked with Notepad++, so I can't tell you what's the case with it. FastLionDesign, care to further explain (Note: I told you you're only going to raise more questions...)?

    As for UTF-8 safe string functions... you use them in place of the standard ones, assuming you use any to begin with. If you use strlen() for something, reaplce it with iconv_strlen() or mb_strlen(), etc.

    It is also worth noting that if a function/class/API has an encoding/charset option, you should explicitly specify UTF-8 to it. Case in mind - htmlspecialchars().
    XML_XSLT2Processor - perform XSLT 2.0 transformations in PHP.
    (my library, all feedback welcome)

  21. #21
    SitePoint Enthusiast fvsch's Avatar
    Join Date
    Apr 2009
    Location
    Lyon, France
    Posts
    64
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hello,

    Possible caveats with switching to UTF-8:

    1. If you don't know what an encoding is, what the principle for those strange beasts (UTF-8, ISO-8859-1, SHIFT-JIIS, ...) is, etc., then you're likely to make mistakes. You may want to read this introduction to character sets and encodings. Remember (or learn...) that you never write characters in a file, you just write numbers (zeros and ones, you know). What numbers are written when ask an editor to save a file depends on the encoding you chose. "A" can be 1, 28, 15657, 4564867, whatever (well, actually those values are bogus, i'm just talking about the principle), depending on the encoding you chose. Now, to read a file or any type of text-based data, the software reading the data (a code editor, a web browser, a SQL server...) has to know what encoding to use to convert the numbers it gets to actual characters. Bear that in mind. And do read more articles on the W3C's internationalization activity's website about this; it is a complex topic.

    2. If your data is already written with one encoding, it's HARD to switch to a different encoding. Not so hard when you have ten static HTML pages to convert (then it may be pretty easy, but not always). But if you have data already written in a database, it really is a pain. If your data is written as ISO-8859-1 (and, from my experience with MySQL, phpMyAdmin telling you your database is UTF-8 is no guarantee that the data IS UTF-8 and IS MARKED AS UTF-8), then you may as well stick to ISO-8859-1 if it fits your needs. The characters you mention are all in the ISO-8859-1 charset, so that should be no problem.

    3. Programs, libraries and functions you use. These, if old or badly coded, may be unable to handle UTF-8. Or they may be unable to handle anything other than plain ASCII (roman letters without diacritics or so-called “special characters”). Or they may be able to handle most encodings fine, but you have to use an option or tell it what encoding you're using if you want the tool to work. Solutions to such problems: read the docs, ditch the tool if it's definitely bad, or work around the problem in some way (might be tricky).
    (See boen_robot's message just before this one.)

  22. #22
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I haven't add the meta to my head content yet because anything won't display if I do it at this moments.

    About the notepad I can see the encoding menu now in the simple notepad that windows brings lol... Well my question is should I open all the web pages used in the web site in notepad and then re-saved them as utf-8 in the notepad? I did it nothing relatively change I know that the encoding underneath of what is seen changed, Just let me know guys if something is supposed to be affected after changing the encoding in the notepad...


    Hey florent this topic seen to be complicated and extended I am here to learn with all of you guys. I just want to make things as simple as possible within what is considered simple because definitely all these topics, php, mysql, encoding, CSS, html etc.. are not simple at all. I considered we are here to have hard fun lol....

    I have skipped step four using the string functions by now since we have been focusing on notepad encoding. I am using dreamweaver as a editor but I don't see any encoding menu to save it in utf-8 as the simple notepad does it. I am sure dreamweaver does it but I haven't notice where. Anyway will I have to go through the process of saving each web page in the website as utf-8 in the notepad?

    Florent in the case of writing the number in the editor instead of letter as a way of commanding the editor to print certain characters again instead of writting the chracter itself What about of actually writing the character not the number in the database.. Because that what I did I wrote the character in the database fields and then they print through indexes in the php code... I have heard that it complicates thing writing the actual character in the database... but then how displaying all this names in the database fields through one single format table that uses a while loop or any other loop to simply repeat and display all the database field in a table....


    boen_robot is there a web page where specify what UTF-8 safe string functions is used in substitution of x string standard fuction so it is easier to know what substitute for ?


    Florent I don't know what encoding mechanism I am using at the moment, How can I determine that? by the database or by the php files?

    When I use ISO-8859-1 in the meta tags is ok and when I take it out it's still the same but when I use utf-8 it won't display anything.

    guys on this web site http://www.w3.org/International/ques...at-is-encoding
    they say "If you use anything other than the most basic characters needed for English, people may not be able to read your text unless you say what character encoding you used"

    but then what happen when you wrote it on the database how can you say the text editor what character encoding I used?
    I as asked a similar question above?

    Thank you guys...

  23. #23
    SitePoint Zealot boen_robot's Avatar
    Join Date
    Apr 2008
    Location
    europe://Bulgaria/Plovdiv
    Posts
    116
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I'm not that familiar with encodings myself. That's why I just specify UTF-8 to everything, and forget about it. Still, I get the basic idea, so the answer to your question really is - it depends.

    When you send data into the DB, PHP encodes the characters with its client encoding, and sends them to the DB. The DB decodes those numbers based on the DB encoding, and performs the operation with the resulting characters in mind.

    When you receive data from the DB, it's the other way around - the DB encodes the characters with it's encoding, and gives them to PHP, which decodes them with its client encoding.

    Same goes for any I/O point in a web app, and there are quite a few of those unfortunatly, hence all the needed points where you need to specify UTF-8. There is "browsers-server" (user preference - HTTP headers), "script-interpreter" (file encoding - capable functions), "PHP-DB" (PHP client encoding - DB collation), and probably a few more I can't think of right now.

    I'm not aware of a table between standard and UTF-8 aware functions... but the basic idea is that you replace "*" with "iconv_*", of if there isn't such a function - with "mb_*".

    In Dreamwaver, you can save files as UTF-8 from "Edit > Preferences...", then from the left "New Document", and from the right, select "Unicode (UTF-8)" for the option "Default encoding". To convert an existing file, you must open it, then go to "Modify > Page Properties...", then from the left "Title/Encoding", and change the encoding to "Unicode (UTF-8)". Reload to perform the change, and that's all.

    It's a good idea to convert all of your existing files to UTF-8, but unless they are PHP files or files that contain text to be displayed, there isn't a good reason you should bother, as it won't make a difference. Case in mind - CSS, .htaccess and JavaScript files.
    XML_XSLT2Processor - perform XSLT 2.0 transformations in PHP.
    (my library, all feedback welcome)

  24. #24
    if ($zee == "Guru") { $zee--;}
    Join Date
    Nov 2005
    Location
    Karachi - Pakistan
    Posts
    1,134
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi

    I was stuck in a similar problem. But I have solved my issue by using utf8_decode()

    All my pages are utf-8
    mysql database/table/colums is utf8_general_ci

    Regards
    Zeeshan

  25. #25
    SitePoint Wizard co.ador's Avatar
    Join Date
    Apr 2009
    Posts
    1,054
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    still the rating system is not rating the items which have encoding charaters the only step i am missing now is to replace the utf8- string function for the normal ones. which is step four from boen suggestions.

    I hope it can be fixed these issue

    the string functions below are the one used by the rating system most of the time. what would be the substitution ones in utf-8 string functions for the normal ones below?

    PHP Code:
     if ($varItem != null && strlen(trim($varItem))  != 0)
            {
              
    // Check if Magic QUotes is ON
              
    if (!get_magic_quotes_gpc())
              {
                
    $varItem addslashes($varItem);
              } 
    I had to put back the other type of encoding

    HTML Code:
    <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1">
    If I put utf-8 then the rating won't refresh a similar problem I had before but caused by a / at the end of the rating php file, this time comes back again if I don't put the meta tags with ISO-8859-1 encoding. The database still in utf-8 there is obvious a mixture here between the encoding of the webpages and the database encoding but I can't change the header encoding because it will throw out the rating system and won't display the encoding character in the browser from the database. a very complicated case help please


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •