SitePoint Sponsor

User Tag List

Results 1 to 21 of 21
  1. #1
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    charset=ISO-8859-1 doesnt have the euro symbol

    i wrote this blog using java and mysql; when i enter the € (euro symbol), all i get, when i retrieve the data from database, is a question mark
    does someone has any idea how to solve this?
    as i said before, i'm using charset=ISO-8859-1 (latin 1, in mysql)

    thanks in advance

  2. #2
    Grüße aus'm Pott gold trophysilver trophybronze trophy
    Pullo's Avatar
    Join Date
    Jun 2007
    Location
    Germany
    Posts
    5,347
    Mentioned
    179 Post(s)
    Tagged
    9 Thread(s)
    Hi there,

    ISO/IEC 8859-1 is missing some characters for French and Finnish text, as well as the euro sign.
    Could you simply not specify another charset on your pages, such as utf-8 or ISO-8859-15?

  3. #3
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i tried utf-8 but got some strange characters, so i'm gonna try the other
    brb

  4. #4
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i tried
    <meta http-equiv = "Content-Type" content = "text/html; charset = iso-8859-15">
    but no luck; still the question mark instead...

  5. #5
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i tyried again with utf-8 without success

  6. #6
    Grüße aus'm Pott gold trophysilver trophybronze trophy
    Pullo's Avatar
    Join Date
    Jun 2007
    Location
    Germany
    Posts
    5,347
    Mentioned
    179 Post(s)
    Tagged
    9 Thread(s)
    Can you check the character set for the database table in which your content is stored.
    What is that?

  7. #7
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    its latin1

  8. #8
    Grüße aus'm Pott gold trophysilver trophybronze trophy
    Pullo's Avatar
    Join Date
    Jun 2007
    Location
    Germany
    Posts
    5,347
    Mentioned
    179 Post(s)
    Tagged
    9 Thread(s)
    Hi,

    So, to summarize:
    One or more fields in your database (which is a latin1_whatever) show the Euro symbol fine in PHPMyAdmin.
    However, when you try to output these fields in a webpage, the Euro sign shows up as the question-mark.
    You are using the following meta tag on the page: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    Is that correct?

    Could you provide the code you are using to read the data from the database and to output it on the page.

  9. #9
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    >> One or more fields in your database (which is a latin1_whatever) show the Euro symbol fine in PHPMyAdmin.
    no, when i open mysql query browser, i already have a question mark

    >>
    when you try to output these fields in a webpage, the Euro sign shows up as the question-mark.
    yes

    >>
    You are using the following meta tag on the page: <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" />
    Is that correct?
    no, i'm currently using
    <meta http-equiv = "Content-Type" content = "text/html; charset = iso-8859-1"> meta tag, but i tested with both charset = iso-8859-15 and also with UTF-8

    >>
    Could you provide the code you are using to read the data from the database and to output it on the page.

    i think it would be also relevant posting the code to insert in bd
    the servlet that inpust to db:
    Code Java:
    package blog;
     
    import java.io.*;
    import javax.servlet.*;
    import javax.servlet.http.*;
    import java.sql.*;
    import bd.*;
     
     
    public class Escrever extends HttpServlet {
        private JdbcAccess access;
        private int linhas;
     
     
        public void init() throws ServletException {
            access = new JdbcAccess("avulsas");
        }
     
     
        public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
            String titulo = request.getParameter("titulo");
            String texto = request.getParameter("texto");
            long data = java.lang.System.currentTimeMillis() / 1000;
     
     
            String sql = "INSERT INTO posts (data, titulo, texto) VALUES (" + data + ", '" + titulo + "',  '" + texto + "')";
     
     
            try {
                linhas = access.executaUpdate(sql);
            }
            catch (SQLException msg) {}
     
     
            response.setContentType("text/html");
     
     
            if (linhas ==1) {
                PrintWriter out = response.getWriter();
                out.println("<html>");
                out.println("<head>");
                out.println("<title>Escrever</title>");
                out.println("<meta HTTP-EQUIV=\"REFRESH\" content=\"0; url=http://rsacramento.no-ip.org/Blog\"");
                out.println("</head>");
                out.println("<body>");
                out.println("</body>");
                out.println("</html>");
            }
        }
     
     
        public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
            doGet(request, response);
        }
    }

    the servlet i use to read from db is next:
    Code Java:
    package blog;
     
    import java.io.*;
    import javax.servlet.*;
    import javax.servlet.http.*;
    import java.sql.*;
    import bd.*;
    import util.*;
     
     
    public class Avulsas extends HttpServlet {
        private JdbcAccess access;
     
     
        public void init() throws ServletException {
            access = new JdbcAccess("avulsas");
        }
     
     
        public void doGet(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
            String sql = "SELECT * FROM posts order by data desc LIMIT 5";
            String apagina = "";
     
     
            try {
                ResultSet rs = access.executaQuery(sql);
                access.fecha(access.getConnection());
     
     
                apagina = AvulsasUtil.formata(rs);
                request.setAttribute("apagina", apagina);
                getServletContext().getRequestDispatcher("/jsp/avulsas.jsp").forward(request, response);
            }
            catch (SQLException msg) {
    //          String erro = "De momento não é possível comunicar com a base de dados.<br /> Tente mais tarde.";
                Object erro = msg.toString();
                request.setAttribute("erro", erro);
                getServletContext().getRequestDispatcher("/jsp/erro.jsp").forward(request, response);
            }
        }
     
     
        public void doPost(HttpServletRequest request, HttpServletResponse response) throws IOException, ServletException {
            doGet(request, response);
        }
    }
    and also:
    Code Java:
    package util;
     
    import java.sql.*;
    import java.text.*;
     
     
    public class AvulsasUtil {
        public static synchronized String formata(ResultSet rs) throws SQLException {
            StringBuilder pagina = new StringBuilder();
            String dataTemporaria = "";
     
     
            while (rs.next()) {
                long mili = rs.getLong(2);
                mili = mili * 1000;
                java.util.Date data = new java.util.Date(mili);
                String padraoExtenso = "EEEEEE, d 'de' MMMMMM 'de' yyyy";
                String padraoHora = "HH:mm";
                SimpleDateFormat  sdfExtenso = new SimpleDateFormat(padraoExtenso);
                SimpleDateFormat  sdfHora = new SimpleDateFormat(padraoHora);
                String dataPorExtenso = sdfExtenso.format(data);
                String dataHoraria = sdfHora.format(data);
                String titulo = rs.getString(3);
                String texto = rs.getString(4);
                if (!dataTemporaria.equals(dataPorExtenso)) {
                    dataTemporaria = dataPorExtenso;
                    pagina.append("<h1>" + dataTemporaria + "</h1>\n");
                    pagina.append("<h2>" + titulo + "</h2>\n");
                    pagina.append("<p><span class = \"horas\">" +
                            dataHoraria +
                            " </span>" +
                            texto +
                            "</p>\n");
                }
                else {
                    pagina.append("<h2>" + titulo + "</h2>\n");
                    pagina.append("<p><span class = \"horas\">" +
                            dataHoraria +
                            " </span>" +
                            texto +
                            "</p>\n");
                }
            }
     
     
            return pagina.toString();
        }
    }

    hope it helps

  10. #10
    Grüße aus'm Pott gold trophysilver trophybronze trophy
    Pullo's Avatar
    Join Date
    Jun 2007
    Location
    Germany
    Posts
    5,347
    Mentioned
    179 Post(s)
    Tagged
    9 Thread(s)
    Hi,

    If you can't see the Euro symbol in PHPMyAdmin, that is not so hopeful.

    I know it sounds obvious, but did you try setting the correct encoding in your browser?
    Which browser are you using?
    Which encoding do you have?
    Does this problem occur in all browsers?

    Regarding your code, I'm afraid my Java isn't wonderful.
    Had it been PHP and had the Euro sign been displaying properly in PHPMyAdmin, I would have had a bunch of suggestions.

    As it is, if changing your browser's charset encoding doesn't help, it might be the case that the Euro symbol isn't being stored correctly in the first place.

  11. #11
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i'm testing with latest opera, with very recent chrome and with i.e.8, and its equal all over
    i read this article:
    http://www.oracle.com/technetwork/ar...et-142283.html,
    but was no help for me too
    >> it might be the case that the Euro symbol isn't being stored correctly in the first place
    yeap

    thanks anyway

  12. #12
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i guess must have something to do with server's charset... (using tomcat)

  13. #13
    SitePoint Zealot Michel Merlin's Avatar
    Join Date
    Mar 2005
    Location
    Versailles (France)
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Encode in local charsets (ISO-8859-1, Shift-JIS, etc) and use FINANCIAL Euro symbol

    Encode in local charsets (ISO-8859-1, Shift-JIS, etc) and use FINANCIAL Euro symbol

    Recommendation:

    1. First you need the same charset everywhere in your information handling chain, seamlessly from forms and email to web pages to DB, including all according back-and-forth interfaces.
    2. For this you need to select a charset that will actually work in real world. If dealing with public in (North or South) America or Western Europe, the only currently (while waiting for UTF-8 to become ready) efficient (hence affordable and reliable) combination is ISO-8859-1 + EUR (the ISO 4217 3-letter FINANCIAL symbol). In Japan, JIS or Shift-JIS + EUR. And accordingly in the rest of the world.

    Explanation:

    1. In real life in France I often receive from big companies French email messages that they have entirely stripped from the due French accents (no matter the mailer they use), making them ugly and difficult to read, yet readable; this is apparently because, being usually English-speaking, they still encode in UTF-8, ignoring (since in English UTF-8 brings no difference or drawback or benefit over ASCII) that UTF-8 is the cause of their problems with NONASCII chars; oppositely the email messages I receive in Western language (FR, EN, DE) from most other companies or individuals are encoded in ISO-8859-1, and rid of charset problems. This (temporary I hope) situation is IMO because compatibility problems between UTF-8 and traditional fixed-length charsets have been underestimated by the official bodies in charge of enforcing UTF-8; as a result, UTF-8 problems in real world with NONASCII characters:
      1. are inexistent in English, where all characters are ASCII, so encoding in UTF-8 is actually encoding in ASCII;
      2. are few in Western European languages, where few chars are NONASCII, so encoding in UTF-8 does make documents inelegant, but not unreadable;
      3. are total in Japan, where most chars are NONASCII, so UTF-8 not only augments the document size but, in real world, causes most characters replaced with Mojibakes, making UTF-8 vastly rejected by regular people (Note: I still need, and would appreciate, more recent, direct, helpful, precise and reliable checks and facts in English about charsets in Japan, from able persons, if possible Japanese or living in Japan; same about China mainland, Hong-Kong and Taiwan).

    2. If you send some text (through email or a form) to someone in the public, you have no control over what they will do with that text (editing, replying, forwarding), and particularly what programs or charset(s) will be used down the workflow. Many of your correspondents will knowingly or not use their local charset, so if you have encoded in another one (namely UTF-8 if you are NOT writing in English), they will encounter a lot of big problems with no solution apparent to them, whence their going back traditional charsets or removing accents.
    3. In real world, UTF-8's goal (efficiently representing all the 0.1-1-million Unicode characters in the world) has only been successfully achieved in complete closed pure-UTF-8 environments built with careful intelligent thinking and sufficient resources, as Wikipedia; others generally tend to go back to "traditional" local fixed-length charsets (ISO-8859-1 in Western European Languages, JIS for email and Shift-JIS for web pages in Japanese, etc).
    4. ISO-8859-15's main goal (and effect) is to introduce the Euro typographical symbol "€", but it does so by substituting it to the general currency typographical symbol "¤", so in real world if you send an ISO-8859-15-encoded "€", somewhere down the workflow it will inevitably get replaced with an ISO-8859-1-encoded "¤", building a damageable confusion, thus making ISO-8859-15 unsafe thus actually unusable. Oppositely, the Euro financial symbol "EUR" is recognized, understood, read, written, conveyed, transcribed, immediately sans ambiguity or error by any person or machine or program world-wide, from financial traders to shoe shiners, from Bhutan to Manhattan. So, after (inter alia) my various posts and emails, many sites like amazon (.fr, .de, etc) or wikipedia (all) have now switched, in their use or recommendations, from "€" to "EUR".

    Details: For Long URLs, Accentuated Chars, encode as Quoted-Printable, Western European (ISO), use EUR for Euro symbol of Sun 19 Nov 2006.

    Versailles, Thu 13 Dec 2012 22:00:00 +0100

  14. #14
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,154
    Mentioned
    14 Post(s)
    Tagged
    0 Thread(s)
    Hi, Michel. Thanks for your very detailed replies! I hope you don't mind a follow-up question. Perhaps it's due to me being in an English-speaking bubble, but my understanding was that UTF-8 is universally understood by now. Many large websites use it, I presume successfully. In Western Europe or eastern countries, is there still software being used that doesn't support UTF-8?
    "First make it work. Then make it better."

  15. #15
    SitePoint Zealot Michel Merlin's Avatar
    Join Date
    Mar 2005
    Location
    Versailles (France)
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Big sites tend to UTF-8 for EN pages, local charsets for forms

    Big sites tend to UTF-8 for EN pages, local charsets for forms
    Quote Originally Posted by Jeff Mott View Post
    (Thu 13 Dec 2012 22:15 GMT)
    ...my understanding was that UTF-8 is universally understood by now. Many large websites use it, I presume successfully.
    In the 2 sites you link ( http://www.google.fr and http://www.yahoo.co.jp ) and in the very page we are posting on right now (charset=ISO-8859-1 doesnt have the euro symbol), let's check the charset they state in their HTTP Headers (using HTTP Web-Sniffer 1.0.44) and in their HTML source (I recall that, whatever we can think about it, the HTTP Header has priority over the HTML source):

    1. http://www.google.fr (and http://www.google.co.jp BTW) actually uses ISO-8859-1 (states "Content-Type: text/html; charset=ISO-8859-1" in its HTTP Header, and nothing in its HTML source)
    2. http://www.yahoo.co.jp (as well as www.yahoo.fr, that redirects to http://fr.yahoo.com, or as http://www.yahoo.com ) actually uses UTF-8 (states "Content-Type: text/html; charset=utf-8" in its HTTP Header, and <meta http-equiv="content-type" content="text/html; charset=utf-8"> in its HTML source
    3. this SitePoint page actually uses ISO-8859-1 (states "Content-Type: text/html; charset=ISO-8859-1" in its HTTP Header, and <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> in its HTML source), and nevertheless displays correctly the Euro "€" and Currency "¤" typographical symbols (and some).

    Notice however that between my checks of 2008 and 2011, some sites have converted from UTF-8 to local charsets, some the other way; a typical case is SONY, where global and US sites have switched from ISO-8859-1 to UTF-8 (that will do them no hurt at all since for English UTF-8 is actually ASCII), while local sites have remained in, or converted to, local charsets, especially in their form pages: see SONY and Sony USA (from ISO-8859-1 to UTF-8), Sony Global (ISO-8859-1), Sony JP (Shift-JIS), Sony FR (UTF-8) > Contact (UTF-8) > Form (Windows-1252).
    Quote Originally Posted by Jeff Mott View Post
    In Western Europe or eastern countries, is there still software being used that doesn't support UTF-8?
    Sure everything exists in Nature, yet the remaining ones that don't support it at all must be rare. However many sites support UTF-8 but incompletely or wrongly. A notable case is Microsoft, who despite its vast resources never corrected Outlook Express' big UTF-8 flaw in editing HTML source, and took years before correctly taming UTF-8 everywhere else.

    Versailles, Fri 14 Dec 2012 12:15:10 +0100

  16. #16
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Michel Merlin View Post

    1. this SitePoint page actually uses ISO-8859-1 (states "Content-Type: text/html; charset=ISO-8859-1" in its HTTP Header, and <meta http-equiv="Content-Type" content="text/html; charset=ISO-8859-1"> in its HTML source), and nevertheless displays correctly the Euro "€" and Currency "¤" typographical symbols (and some).
    that's intriguing: how do they do it that i cant have it?

  17. #17
    SitePoint Zealot Michel Merlin's Avatar
    Join Date
    Mar 2005
    Location
    Versailles (France)
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by rfl View Post
    (15:37 GMT)
    i cant have it?
    I guess what you mean is you don't have the Euro and Currency signs properly displayed. It can't be a FONT problem since the fonts used (Arial, Verdana) are very common. So, have you checked your browser is set to detect the charset in the web page, as told in my "Details" link in last line of 21:00?

    Versailles, Fri 14 Dec 2012 18:07:25 +0100

  18. #18
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yes, in opera, chrome and ie8

  19. #19
    SitePoint Wizard bronze trophy Jeff Mott's Avatar
    Join Date
    Jul 2009
    Posts
    1,154
    Mentioned
    14 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by rfl View Post
    that's intriguing: how do they do it that i cant have it?
    I suspect the euro and currency symbols are special cases. The browser probably error corrects for the euro symbol, because even though it isn't in the iso-8859-1 set, it is in the windows-1252 set. And the currency symbol is actually in iso-8859-1, so that one is legal.

    If yours is coming through as a question mark, then I suspect it's not the browser but your server that is sending it that way. You'll need to follow Michel's advice to have "the same charset everywhere in your information handling chain." If either your application or your database is Latin1, then either one of those could be replacing illegal characters with the question mark. You'll need to pick a charset that can support all characters (in the English-speaking world, UTF-8 is by far the most popular choice), and make sure everything is using that charset.
    "First make it work. Then make it better."

  20. #20
    SitePoint Evangelist
    Join Date
    Apr 2003
    Location
    lisboa
    Posts
    423
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i'm still working on how to alter tomcat's charset
    a bit off topic: i notice that in my app, if i have a " character or a - character, there it goes again - i get a question mark; but if i edit it, i mean, rewrite it, i get it right!

  21. #21
    SitePoint Zealot Michel Merlin's Avatar
    Join Date
    Mar 2005
    Location
    Versailles (France)
    Posts
    169
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Big site often in UTF-8+"EUR" with ISO-8859-1, Shift_JIS, Big5 for visitor input

    Big site often in UTF-8+"EUR" with ISO-8859-1, Shift_JIS, Big5 for visitor input

    Here's an attempt at investigating what charsets are used by big sites world-wide, and particularly, on what scale is UTF-8 spread, expanding, or shrinking. It was done in 2008 and refreshed since (albeit the changes are smaller than expected).

    The charset a site is using is stated, 1st priority by its HTTP Header (known using HTTP Web-Sniffer 1.0.44), 2nd priority by its HTML source ("View > Source" in ie6, F12 in ie9, Ctrl+U in Chrome 23, etc).

    Test Summary

    1. Tested Thu 17 Jul 2008 (using HTTP Web-Sniffer 1.0.37). Big sites' most used charsets are apparently UTF-8, ISO-8859-1 (West Europe, Americas, Oceania, most Africa), Windows-1252, ISO-8859-15, Chinese Traditional Big5 (TW, HK, MO), Chinese Simplified GB2312 (CN), Japanese Shift_JIS, Korean euc-kr.
    2. Revised Mon 11 May - Sun 17 May 2009, found only 4 changes: SONY and Sony USA ISO-8859-1 > UTF-8, BMW Korea euc-kr > UTF-8, Microsoft FR Vista ISO-8859-1 > UTF-8.
    3. Mon 04 Jan 2010 19:13:16 +0100, found (coming from the nice BENNY & PEGGY - ON THE SUNNY SIDE video) a good example of a carefully designed Japanese site nicely displaying intertwined Japanese and Western characters by correct use of UTF-8: SOM ASSOCIATE ARCHITECTS.
    4. In Jul 2011 when checking again, that SOM site had converted to "charset=Shift_JIS" and added a Contact page (also in "Shift_JIS"); the few others checked were unchanged; so I then aborted checking.
    5. On Sat 15 Dec 2012 I rechecked, a little faster than in 2008 (only Web-Sniffer, little source check). A number have switched to UTF-8 (SONY EN, Toshiba FR, Toyota JP, Renault JP), a few back or both ways (NISSAN, BMW).

    Result Details

    Below showing what old charsets have been, between 2008 and 2012, replaced with new ones.

    1. Toshiba world-wide (UTF-8), Toshiba USA (ISO-8859-1), Toshiba Japan Top Page (UTF-8), Toshiba JP (Shift_JIS), Toshiba FR (ISO-8859-1 UTF-8) > Contacts (ISO-8859-1) > true email (no forms), replaced in 2012 with Contacts (UTF-8 but no more real contact)
    2. CLEVO US (UTF-8), CLEVO TW (Big5), contacts are by true email
    3. Hitachi (UTF-8), Hitachi Global (UTF-8), Hitachi JP (Shift_JIS), Hitachi FR (UTF-8) > Électronique de consommation (UTF-8) > Nous contacter (UTF-8) > formulaire du service client (ISO-8859-1) in 2012 replaced with UTF-8 contact-less page with email addresses
    4. Toyota (UTF-8), Toyota CN (GB2312), Toyota JP (Shift_JIS UTF-8), Toyota FR (UTF-8) > Contacts (2012 now .tmex, still UTF-8) > Posez votre question (UTF-8, 404-ed in 2012) > Form (still ISO-8859-1 in 2012), replaced with Formulaire (still UTF-8)
    5. NISSAN COMPUTER CORP (UTF-8 ISO-8859-1), NMC (Nissan Motor Corp) (ISO-8859-1 UTF-8), NMC JP (Shift_JIS ISO-8859-1), Nissan FR (UTF-8) > Contact > Contactez Nissan (UTF-8) = actually a KB, with NO contact entry, replaced in 2012 with email addresses
    6. Renault (UTF-8), Renault BR (UTF-8), Renault JP (Shift_JIS UTF-8), Renault FR (UTF-8) > Contact (UTF-8) > Service Relation Client Renault or here (form, UTF-8, but after 1 hour of tests and phone calls, it appears this form actually refuses any character outside ISO-8859-1)
    7. BMW (UTF-8 ISO-8859-1), BMW FR FR FR (UTF-8 ISO-8859-1) > various forms that, while hidden behind frames, appear coded in ISO-8859-15 UTF-8, BMW JP JP JA (UTF-8, but no forms found; in 2012, ISO-8859-1 with form in UTF-8), BMW CN (UTF-8), BMW HK (Big5 UTF-8), BMW HK Store (Big5), BMW Korea (euc-kr > UTF-8 > ISO-8859-1)
    8. Microsoft sites world-wide are apparently all in UTF-8, including forms in NON-ASCII languages, e.g.: Microsoft > MS US (UTF-8), Contact Us, View Customer Service Solution Centers, bottom right: Contacts, E-mail Customer Service > Customer Service Contact Us (form, UTF-8); idem Microsoft FRANCE (UTF-8), ..., Contactez Nous : Plus d'information (form, UTF-8, but replies in ISO-8859-1 with other chars corrupted); Microsoft Japan (UTF-8), bottom left: お問い合わせ先 > Microsoft Customer Service & Support, ウェブ/メールでのお問い合わせ > Microsoft Japan Customer Directory Web Mail (Shift_JIS UTF-8), Contact US > Contact Us マイクロソフトへのご意見・ご要望 (form, UTF-8); MSDN and TechNet (all languages) are apparently entirely in UTF-8: MSDN JP, MSDN CN, MSDN KR, MSDN TW; MU, WU, OU, Xbox, MSKB (e.g. KB953979) as well; Microsoft FR Vista (ISO-8859-1) has been rewritten and relocated in Microsoft FR Vista (UTF-8)
    9. however excepted US (where UTF-8 has no difference with ASCII), some parts of sub-sites are still in fixed-length charsets, e.g. Microsoft JP Vista (Shift_JIS UTF-8), Microsoft FR mice and kbds (ISO-8859-1) relocated in 2012 to Microsoft FR Claviers, souris, webcams et autres (UTF-8).
    10. Wikipedia, Wikipedia JP, Wikipedia EN, Wikipedia FR, are apparently all totally in UTF-8, including the proprietary yet rich end-user editor available for each page - and in the Editing Wikipedia:Sandbox (form+display, UTF-8, OK), where any difficult char string is correctly rendered, whether entered in visible characters, in NCRs or in entities
    11. amazon (some pages windows-1252, most ISO-8859-1, even the pages tagged "UTF8"), amazon FR and amazon DE (View = unknown, because: HTTP headers = 8859-1 + 8859-15, source = 8859-1), amazon UK (ISO-8859-1), amazon JP (Shift_JIS). In 2012, "€" has been replaced everywhere with "EUR" (apparently after my emails to amazon, see §4 in Encode in local charsets ... and use FINANCIAL Euro symbol), and "windows-1252" has disappeared; no other changes, e.g. many HTTP headers still ISO-8859-15, even while still often carrying "ie=UTF8" in their URLs.
    12. SitePoint (UTF-8), Article (relocated, still UTF-8) or Blog (relocated, still UTF-8) > Forums (relocated, still ISO-8859-1) > Reply Form (newer example, still ISO-8859-1): SitePoint, while promoting UTF-8 and aptly applying it on regular pages, turns to ISO-8859-1 as soon as visitors' input is significant
    13. The Autistic Cuckoo (ISO-8859-1) > Autistic Cuckoo (ISO-8859-1) > Hjälp Australien (ISO-8859-1) (In 2012 this site doesn't open, yet still responds "ISO-8859-1". Swedish is 100% covered by ISO-8859-1, example Dagen är nära, i.e. Lascia Ch'io Pianga in Swedish, and its lyrics)
    14. Accessites (UTF-8) > Site (ISO-8859-1) > Contact (form, UTF-8)
    15. sk89q > UTF-8 web form and UTF-8 webmail reply (form+mail, UTF-8, OK), not found in 2012.

    Result Summary

    Windows-1252 seems disappearing, "€" looks slowly replaced with "EUR", ISO-8859-15 remains rare, big sites tend to switch to UTF-8, but the (big or small) ones having to deal with non-English speaking or having not as big resources tend to revert, for their inputs (forms, forums), to main local fixed-length charsets, still very efficient and reliable, mainly the 3 biggest: ISO-8859-1 (the historical web standard), Shift_JIS (main Japanese), Big5 (main Chinese).

    Note on UTF-8

    UTF-8, which as all charsets is ASCII-based, is fine when it adds nothing to ASCII (like in English language that uses only ASCII characters). It works fine too when the character flow is one-way (web sites with no feedback or interaction). But when there is significant amount of input from the other end (Forms, Forums or other interactive websites, email) then after 2 or 3 steps traveling or editing with interactions with sites, programs, DBs or users using other charsets, UTF-8 too often causes major problems on NON-ASCII characters; when ASCII remain the majority of characters (Europe), UTF-8 emails, while poor and ugly, are still readable; but when ASCII are minority or absent (e.g. Japan), UTF-8, in addition to becoming useless (since it then requires 2 bytes for each character, defeating the very purpose of UTF-8), soon makes emails totally unreadable, causing massive rejuection in the population, who revert to fixed-length encodings as ISO-8859-1, Shift_JIS or else. Notice however that, while apparently difficult to implement and use in real life long workflow (many loud people say it's easy... yet fall short from help to the ones impaired), UTF-8 does already fills its high promises in some cases (Wikipedia), so we probably will have it work as expected in some not too remote future - after the current period where ISO-8859-1 (the historical web standard, and the ASCII-nearest 255-char charset) or whatever national-scale fixed-length charsets (Shift_JIS, Big5) remain meanwhile more reliable for email, forms, forums, and other public interactions.

    Versailles, Sat 15 Dec 2012 17:28:00 +0100
    Let's make sure of the facts before getting in the cause -- Fontenelle


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •