|
|||||||
New to SitePoint Forums? Register here for free!
|
![]() |
|
|
Thread Tools | Display Modes |
|
|
|
|
#1 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
The Definitive Guide to Web Character Encoding
Notice: This is a discussion thread for comments about the SitePoint article, The Definitive Guide to Web Character Encoding.
__________ Thanks for this very good article. However as many US-English speakers, the author ignores that UTF-8 actually misses its goal and *in facts* doesn't work. When you write in UTF-8, your text sure goes through in perfect form as long as you don't write any character outside the ASCII 127 first, or as long as the recipient reads it in its original document. But as soon as he uses it elsewhere, e.g. by Replying or Forwarding, each European accentuated character will cripple 2 or 3 characters around it, making the document unusable. Sure this will eventually get fixed, but so far, if you want to write European languages properly, you write in ISO 8859-1. The only lack of it in real world is the Euro *typographical* symbol, which you appropriately replace with the Euro *financial* symbol, EUR, which is more widely officially standardized, and actually read and understood by any person or program in any country in the world, from Thailand to USA to Germany. I gave more details in newsgroup MS Public Outlook Express General, e.g. in message « For Long URLs, Accentuated Chars, encode as Quoted-Printable, Western European (ISO), use "EUR" for Euro symbol » posted Sun 19 Nov 2006 18:56:45 GMT. Versailles, Wed 10 Jan 2007 10:57:55 +0100, edited 11:06:50 |
|
|
|
|
|
#2 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
I don't use Internet Explorer (I use Linux) and the links you posted don't work for me (Opera).
Without reading the articles, it seems to me that any problems exist because Microsoft doesn't handle UTF-8 correctly in its applications. That's not a fault within the encoding, it's a fault in Microsoft's software. Without waxing philosophical, how long are we going to let a big corporation with a poor QA department hold back development? |
|
|
|
|
|
#3 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
Unfortunately you are right about MS (unduly) not applying in its products what it (duly) requires from others. But the facts is that, in real world (where a majority are using IE and OE) and at the user level, messages sent in UTF-8 don't work properly, and American users (who generally are more careful at writing properly) tend to write European texts in UTF-8, apparently ignoring that in facts UTF-8 is misproperly handling accentuated characters, i.e. missing its main goal.
Versailles, Wed 10 Jan 2007 16:15:25 +0100 |
|
|
|
|
|
#4 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
To open the NEWS links above:
To open the news links above:
Clicking them should open them in any properly configured browser+newsreader (IE+OE, FF+TB+NGs, etc). In case this fails, try the following:
Last edited by Michel Merlin; Jan 11, 2007 at 07:38. |
|
|
|
|
|
#5 |
|
CSS Advisor
![]() Join Date: Jan 2003
Location: Hampshire UK
Posts: 26,615
|
Thanks for the article Tommy - very interesting and informative
![]() |
|
|
|
|
|
#6 |
|
Robert Wellock
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Apr 2002
Location: A Maze of Twisty Little Passages
Posts: 1,740
|
Unfortunately I have a Firewall stopping me viewing the newsgroup so I cannot see what was written though I'd suspect the said Microsoft applications "lack functionally". I would guess Mr T's. article was focusing more on webpages.
As for 'Sandwich Table', I don't know? You probably over completed certain parts of the article or didn't emphasise enough on the difference between HTML and XHTML (The charset meta declaration is not recognized by XML processors you mentioned brefily). Though not how external CSS and such will get treaded by the two different processors, i.e. @charset "utf-8"; I think you should have also mentioned if you only declare via the META element it should be the first thing that appears after the opening HEAD tag. Other than that it more-or-less covered most things in roundabout TOOL way. |
|
|
|
|
|
#7 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
There aren't too many differences between HTML and XHTML when it comes to character encoding. I mentioned that in real XHTML you should use the XML declaration, while in pretend-XHTML you could use a META elements like the HTML it really is. In either case, a true HTTP header sent by the web server will override.
You are correct about other external files, though. I did realise (too late) that I forgot to include information about encoding for CSS and JavaScript files. Presumably, people would use the same editor settings, though, so it should work out. The META element doesn't have to be the first thing after the <head> tag, but there should be no characters outside the US-ASCII range preceding it. |
|
|
|
|
|
#8 |
|
SitePoint Enthusiast
![]() Join Date: Jul 2006
Posts: 38
|
Thanks for the article Tommy. I enjoy reading your articles because they're informative, and your use of words is extremely precise.
Anyways, I had a question regarding the display of numbers in a different language. I would like to display Arabic numbers on a page, which seems to work in IE7, but not in Firefox. The test code I used is: HTML Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"> <html> <head> <title>Check</title> <meta http-equiv="Content-Type" content="text/html; charset=utf-8"> </head> <body> <p><span>من</span><span> 123</span></p> <p lang="ar"><span> 123</span></p> </body> </html> Is the issue here the support of fonts in the different browsers, or their support for the encoding? |
|
|
|
|
|
#9 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
Using htmlentities() may be necessary if you use a limited encoding, such as the 256-character ISO 8859 series. With UTF-8 you never have to use such awkward functions. htmlentities() bloats the file size by converting everything outside the US-ASCII range into NCRs.
UTF-8 most certainly handles the Euro character, U+20AC. It will be encoded with three octets: E2 82 AC (226, 202, 172). This character is not available in ISO 8859-1, but it is in ISO 8859-15 where it has code point 0xA4. PHP doesn't have native UTF-8 support (yet), but you can still use it if you are aware of the caveats. For instance, strlen() may report the wrong length since it assumes that every character is one octet. |
|
|
|
|
|
#10 |
|
⠵
![]() ![]() ![]() ![]() ![]() ![]() Join Date: Dec 2004
Location: Sweden
Posts: 2,423
|
htmlentities() doesn't work well with UTF-8, or so I heard. If you need escaping then use htmlspecialchars() instead.
|
|
|
|
|
|
#11 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
If you use UTF-8, there is no reason whatsoever to use htmlentities(). Any valid ISO 10646 character can be natively represented in UTF-8.
|
|
|
|
|
|
#12 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
@Michel: UTF-8 is not a failure. Microsoft Outlook Express is, if it cannot handle such a common encoding. Your argument is like saying that CSS is a failure because IE doesn't support it properly.
@Sean: Since PHP4 doesn't have native UTF-8 support, I very much doubt that it will output a BOM without your saying so. A visible BOM may indicate that your web server is declaring the encoding as, e.g., ISO 8859-1. |
|
|
|
|
|
#13 | |||
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
OK, OE bad. Then, which one is good (for UTF-8)?
Quote:
For instance UTF-8 also fails in First Page 2006: Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head><title>Latin 9 (iso-8859-15) « À CURAÇAO, Éric
n'a donné à Françoise Spaßmann que 1?+1£+$1±5% »</title>
<META http-equiv=Content-Type content="text/html; charset=iso-8859-15">
<STYLE>BODY {BACKGROUND: white; FONT: 10pt arial;COLOR: black} </STYLE>
</head>
<body>
<DIV>Latin 9 (iso-8859-15) « À CURAÇAO, Éric n'a donné
à Françoise Spaßmann que 1€+1£+$1±5% »</DIV>
</body></html>
Quote:
Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01
Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html><head><title>Unicode (UTF-8) « À CURAÇAO, Éric n'a
donné à Françoise Spaßmann que 1?+1£+$1±5% »</title>
<META http-equiv=Content-Type content="text/html; charset=utf-8">
<STYLE>BODY {BACKGROUND: white;FONT: 10pt arial;COLOR black}</STYLE>
</head>
<body>
<DIV>Unicode (UTF-8) « À CURAÇAO, Éric n'a donné
à Françoise Spaßmann que 1€+1£+$1±5% »</DIV>
</body></html>
Quote:
So, please show me (with tests) which mail/news handler I could chose to get UTF-8 properly handled (I am quite open, and sure some exist). TIA, Versailles, Fri 19 Jan 2007 10:35:20 +0100 |
|||
|
|
|
|
|
#14 | ||
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
Quote:
Quote:
I use Opera's built-in email client and it has no problems with UTF-8. Copying text from one application and pasting it into another can cause all sorts of issues, depending on the operating system and any intermediate clipboard applications. That is not because of any problems with any one character encoding, but because software vendors cannot agree to use a single encoding (or even repertoire) that works for all needs and languages. The problems you are describing are like comparing languages. If I copy a passage in French from the web and email it to my brother, he won't understand it. That doesn't mean there's anything wrong with French (or my brother) or that all web pages should be written in Swedish. BTW, that editing software you linked to doesn't seem to be worth its price. Software that claims to support XHTML but doesn't handle UTF-8 is laughable. |
||
|
|
|
|
|
#15 | |
|
SitePoint Zealot
![]() ![]() Join Date: May 2006
Location: Amsterdam
Posts: 183
|
Theory
After catching up on this thread I did a bit of reading and found that, as far as I understand it and IMHO, Unicode is a great concept: “Unicode provides a unique number for every character, no matter what the platform, no matter what the program, no matter what the language” - What is Unicode. Practice Yet as explained in the article and in this thread theory and practice sometimes don’t meet app to app. It seems to me that a typical Internet cross-app situation for a user is that they view a web page, find some text they find interesting, copy and paste it into an e-mail and send it to a friend or colleague. Implementation I wondered why OE 6 had problems with UTF-8 so I tested it out. I found this page that describes how to set OE 5 for UTF-8. The same procedure works for OE 6. I then sent a few e-mails with the character tests mentioned in this thread and added a few characters to see what might happen to them as well. Maybe I did something wrong (I did stay in the edit window rather than play with the source window assuming that's what most people would do) … the e-mail test came back without an issue (see attached screenshot oe6_utf8_test.jpg). I then created two web pages with two different encodings: iso-8859-1 and utf-8. I then viewed the pages in FF 1.5, copied text from each page and pasted it into an e-mail and sent it to myself. Both copied sets of characters came back without issue (see attached screenshot copied_webpg_test.jpg ). I also took a look at Outlook 2003 and found this on Unicode: Quote:
I also took a look at MySQL which I usually use as the db for a website. While MySQL supports a wide range of character sets, as far as I can tell, it doesn’t support iso-8859-15, which I had hoped to use. However, it does support utf-8. Summary It seems to me that Unicode is definitely a forward thinking concept and that it would be best to stick with it. The issues in practice will be with the desired audience and whether or not they have editors/readers that can utilize Unicode properly. It does seem that OE 6 handled my Unicode test properly but maybe I tested it wrong. Also it seems that an audience that uses characters outside the default for their native language, assuming that the OS will default to the native language’s character coding, needs to be a bit more educated because their editor/reader may use a default character encoding other than Unicode or Latin1 or whatever that might be. I imagine the best way to handle this is with a message on the website or in an e-mail indicating which character encoding is used so that the user can make a change if necessary. P.S. Maybe this will help with some of the other issues that have come up … http://www.newconnexion.net/article/01-02/agreements.html |
|
|
|
|
|
|
#16 |
|
Mongols of the world, unite!
![]() ![]() ![]() Join Date: Oct 2005
Location: Brasília, Brazil
Posts: 287
|
Because sometimes even UTF-8 can give us headache
![]() I'm using both MySQL and PHPmyAdmin for a website in Japanese, Korean and Portuguese. I tried to load a text file (encoded on Windows Notepad with utf-8) into MySQL and, although I can read the data using PHPmyAdmin (and I can read the file directly in any browser), no browser seems to be able to display the output in a proper html page, showing me a bunch of question marks instead. I've already checked and both <?xml version="1.0" encoding="utf-8"?> and <meta http-equiv="Content-Type" content="text/html; charset=UTF-8" /> are where they should be. I even tried to use different collations with MySQL, but the result is always the same - readable with PHPMyAdmin, confusing pretty much elsewhere. How come? I've read both the replies and the article, and I can't seem to find a reason why it shouldn't work ![]() |
|
|
|
|
|
#17 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
Sorry I didn't enter the right code for UTF-8 in First Page 2006. However this doesn't change the ground; redoing it:
In FP2006, open a new document, and edit the Source view as follows: Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">
<HTML><HEAD>
<META http-equiv=Content-Type
content="text/html; charset=utf-8">
<STYLE>BODY {BACKGROUND: white; COLOR: black;
FONT: 10pt arial}</STYLE>
</HEAD>
<BODY bgColor=white>
</BODY></HTML>
« À CURAÇAO, Éric n'a donné à Françoise Spaßmann que 1€+1£+$1±5% »When you come back to Source view, the contents has been duly shown in codes, but "utf-8" has been changed in "us-ascii": Code:
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0
Transitional//EN">
<html>
<head>
<title></title>
<meta content="Evrsoft First Page" name="GENERATOR">
<meta http-equiv="Content-Type"
content="text/html; charset=us-ascii">
<style type="text/css">
BODY {
BACKGROUND: white; FONT: 10pt arial; COLOR: black
}
</style>
</head>
<body bgcolor="white">
<div>
« À CURAÇAO, Éric n'a
donné à Françoise Spaßmann
que 1€+1£+$1±5% »
</div>
</body>
</html>
« À CURAÇAO, Éric n'a donné à Françoise Spaßmann que 1€+1£+$1±5% »PS. I just see AutisticCuckoo's input, I will reply it separately below (agree-disagree = 50-50). Versailles, Fri 19 Jan 2007 12:07:05 +0100 |
|
|
|
|
|
#18 | |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
Quote:
Note that you cannot expect the editor to pay attention to your META element. The encoding usually needs to be set within the editor itself. |
|
|
|
|
|
|
#19 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
Please report tests showing an email client properly editing European HTML source
Posted by AutisticCuckoo on Fri 19 Jan 2007 10:04 GMT: Because UTF-8 has very simple rules for how to encode any character in the Unicode/ISO 10646 repertoire. It cannot fail.Wow! Now simple rules suffice to prevent from failures? In addition UTF-8 is clearly more complicated than US-ASCII or ISO-8859-1x or some others (for start, with ~1M chars in the UCS or 64K in the BMP, there can't be a really usable single character table in UTF-8), and this IMO is the main cause why MS failed implementing it correctly in OE - while MS didn't fail implementing the others (see my test reports above). The article mainly concerns character encoding for websites.I agree. However users typically edit (or reply) 5% of the email messages they read, but 0.05% of the web pages; hence while the problems in edition look smaller on web pages, they are no different in facts: both documents are edited using basically the same tools, the changes are mainly in the front interface, not in the inside where happen the charset issues. Moreover, many people (me included) write and edit web pages and email messages using the same tools. Hence bringing email here was in topics IMO. I don't think it is fair to suggest that everyone use limited encodings and bloat their pages with NCRs just because some email clients are buggy. ISO 8859-15 may work well for you, but what about authors who want to publish in Chinese?You seem assuming people would write a document in Chinese and encode it in Latin 9 ISO. Assuming others are idiots is most often unrelated to reality, and never a powerful argument. On my side I didn't assume anyone stupid, I recommended by default iso-8859-1 (that in addition to most Europeans fits many Chinese, who aren't opposed to read and write English in many cases), and kept from stating the obvious (that in the other cases one would install and select the appropriate language and charset). And outside such cases, using an European charset for another European language should add little NCRs, since ASCII already covered the most frequent characters in any of these languages. I use Opera's built-in email client and it has no problems with UTF-8.As I recalled, the problems UTF-8 causes arise when at the same time, 1. using special characters (like European accentuated chars), 2. editing the HTML source; then you don't see them when writing English, or when writing in Plain text, or when editing just in WYSIWYG. Please tell your case and if relevant, provide some tests. Copying text from one application and pasting it into another can cause all sorts of issues, depending on the operating system and any intermediate clipboard applications.Windows did have problems copy-pasting - but I saw no instance of it in the last decade. If you have one, thanks to report precisely about it (OS, app, what you did, what you got). That is not because of any problems with any one character encoding, but because software vendors cannot agree to use a single encoding (or even repertoire) that works for all needs and languages.The problems I reported happen with UTF-8 and not with other current charsets; so UTF-8 may be at least a part of the cause. And I think it actually is, because of the amount of effort and care put in its implementation being not adapted to its level of ambition and resulting complexity. More generally, there is no reason to imply that software vendors would have to do all the work and care (to proper implementation), and that standard makers would not have to do any (to make standards clear, coherent thus easy to implement, apply and use). BTW, that editing software you linked to doesn't seem to be worth its price. Software that claims to support XHTML but doesn't handle UTF-8 is laughable.First Page 2006 is free (unlimited evaluation for 30 days, that in facts doesn't end - by the will of the maker, until it has fixed it, as you can see in their forums), so I do think it's "worth its price". Please feel free to laugh, at FP2006 - and at MSOE (also free, and the biggest that "claims to support XHTML but doesn't handle UTF-8"). Posted by AutisticCuckoo on Fri 19 Jan 2007 11:46 GMT: If this happens although you have set the editor to use UTF-8 as the encoding, I think you should ask for your money back. Note that you cannot expect the editor to pay attention to your META element. The encoding usually needs to be set within the editor itself.This (the editor changing the charset in META to a more standard one when it doesn't handle the one you have input there) regularly happened in OE, at least until late 2006, so I will ask for my $0 back. I hope your "editor" does edit (and only after paying attention to) your META tags and attributes. The (charset) encoding needs NOT be set within the editor itself; what is necessary is coherence, which implies unicity (or hierarchy) of the information. If software makers were coherent (dreaming is not forbidden), the encoding in META tags would be either absent, or visible but uneditable, or editable but with immediate replication in its main position (normally in the message HTTP Headers). No surprise MS has been incoherent for long on this (you could edit in META while this didn't change the charset, that would get reversed soon after) - yet fortunately this has been corrected in end 2006: now you can change the charset of a being-composed message, either in the Message window (Format > Encoding) or in the Source, with immediate repercussion in the other. First Page 2006 does the same (while imperfectly: too short and unverified list of charsets, where some very common are missing or wrong; see FP2006 > Format > Document Properties > Body > Document Encoding). I agree that First Page 2006 (that I just installed it this week, due to the good opinion I kept from using its earlier version 1st Page 2000) is far from finished. I saw on their forum several same opinions as mine (FP2006 far from equaling 1st Page 2000) and none opposite. Now after replying on details, let's back to essential: in the real world, where 80% of the market has OE as HTML editor, I recommend using the "Western European (ISO)" encoding ("charset=iso-8859-1"), plus the financial Euro symbol "EUR", as long as UTF-8 is not properly handled - at least according to the only tests I can see so far, which are mine - while still waiting for yours (and others'). Versailles, Sat 20 Jan 2007 01:28:25 +0100, edited (blank lines) 02:06:45 Last edited by Michel Merlin; Jan 19, 2007 at 18:06. |
|
|
|
|
|
#20 | |||
|
SitePoint Addict
![]() ![]() ![]() Join Date: Nov 2006
Location: San Diego, CA
Posts: 389
|
I'm not quite sure what you were expecting FP 2006 to do since it doesn't support utf-8. Just look at the list of encodings it does support. Utf-8 isn't there.
Quote:
It's important to realize you didn't actually copy any utf-8 code into OE in the above test for which you provided pictures. This code is using character entities generated for use with FP's "us-ascii" encoding. Was OE displaying the utf-8 character encoding meta tag when you first switched it into source mode, or did you change the meta tag yourself and never bothered to let OE know? I'm not sure this would affect the results you got in OE, but all these things combined seem to cast doubt on your repeated claim for having provided "accurate and careful reports and tests". Quote:
Quote:
Last edited by CaryD; Jan 27, 2007 at 20:05. Reason: Because so much more was revealled after the post I originally responded to. |
|||
|
|
|
|
|
#21 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
What kind of 'test' are you requiring? There are many millions of web pages encoded with UTF-8 and they obviously work. The fact that your buggy software doesn't allow you to copy text from them is not a failure of the encoding.
If you're trying to make the world a better place, don't do so by spreading FUD about UTF-8; do it by persuading people to stop using buggy software like OE and FP2006. |
|
|
|
|
|
#22 |
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
Please post successful test of source-editing UTF-8 European HTML
I made precise propositions (that, in US as well, people encode in 8859-1 + "EUR" instead of UTF-8, because in their readers, 80% are seeing European chars incorrectly if UTF-8-encoded. This was implicitly for Western texts, that even if American, often include European accentuated characters) that I backed with test reports. You made repeated authoritative stances with no backing at all. Please now try find time to back them with test reports about what was at stake (European HTML source-editing in OE, i.e. in 80% of users, being quite crippled in UTF-8 while quite correct in any other usual Western encoding), or at least about your following stance:
Fri 19 Jan 10h04 GMT: I use Opera's built-in email client and it has no problems with UTF-8.I recall that this was replying to editing the HTML source of UTF-8-encoded European characters (there is no point recalling that a program has no problem in non-issue rendering, e.g. if no European characters, or no UTF-8, or no source-editing). So, it would be useful if you could post a test in Opera like the very short and easy test sample I provided: « À CURAÇAO, Éric n'a donné à Françoise Spaßmann que 1€+1£+$1±5% »To make sure everyone sees it properly whatever their browser settings, here are 2 images showing in OE Source and Edit panes, how the line looks after 0, 1, 2 go-and-back between those 2 panes: For this little test in OE, the line was copied from the HTML source as FP2006 writes it, i.e. with HTML entities: Code:
<DIV>« À CURAÇAO, Éric n'a donné à Françoise Spaßmann que 1€+1£+$1±5% »</DIV> so each image shows 4 lines:
While having not tested myself so far, I hope Opera, TB (that I plan to test on this after my next full backup), and a few more, handle UTF-8 correctly (in source-editing European HTML), but I also have reasons to think that there is a significant probability they also fail in UTF-8 while being correct in other encodings. (Of course if someone finds me in error please help and show me precisely where and how to correct) Attached: UTF-8_European_OE_Source.PNG (27,690 Bytes), UTF-8_European_OE_Edit.PNG (16,260 Bytes) Versailles, Sun 21 Jan 2007 17:39:10 +0100, edited (images narrowed) 21:26:00 Last edited by Michel Merlin; Jan 21, 2007 at 13:26. |
|
|
|
|
|
#23 |
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
I wrote an article about character encoding on the web, but you seem determined to take it completely off track. From your statements, I guess you do not really understand what a character encoding is, since you believe that it can be buggy or erroneous. Obviously I failed to explain it properly.
A key point in my article was that your editor and other components in your publishing chain may limit your choice of encodings. If you use software that doesn't support UTF-8, then you cannot choose to use UTF-8. That should be obvious. I use an editor (Vim) that supports UTF-8, so I can use that encoding if I like. The fact that you cannot edit that markup in your software is not a concern of mine. It's 2007 now. Using an editor that doesn't support UTF-8 and complaining about others using that encoding is like using a horse and a cart on the freeway and complaining about people with cars driving faster than you. You don't have to keep posting screen shots. I believe you when you say you're using an editor that doesn't support UTF-8. There are other editors (even freeware) that do support UTF-8, so I suggest that you switch to one of those, or stop complaining about others using a perfectly good standard encoding which even the W3C recommend. Your comment about 80% of Americans not being able to see European characters if you use UTF-8 baffles me. The combined market share of IE, Firefox, Safari and Opera is probably close to 98%, and all of those browsers support UTF-8 very well. |
|
|
|
|
|
#24 | ||
|
SitePoint Zealot
![]() ![]() Join Date: Mar 2005
Location: Paris (France)
Posts: 124
|
Please back your stances with tests or at least with the necessary precisions.
Quote:
And please stop misrepresenting my sayings. My main saying (my secondary question above was in reply to you) was that when people (particularly US people, but not only) encode in UTF-8 instead of ISO-8859-1, they get no benefit on ASCII chars, while they get a big drawback (yet most often unseen from them) on HTML-source-editing of European accentuated chars (not to mention the less frequent special chars): if attempting to source-edit their text, 80% of their readers will (since theyr are using OE) get 2 chars botched around each special char. Please reply, with tests, on that main saying, not on your imagined ones. It's useless to replace facts and tests with infinitely adding stances, with no ground, no stuff, no backing and no effect (yet full of ridiculous condescending tone), as: Quote:
And remember, insisting in adding unfriendliness, or even impoliteness or hostility, is counter-productive. Being simple and friendly would bring you back a more pleasant (and productive) life (You can do it, I appreciated you e.g. in your sticky topics XHTML vs HTML FAQ, Frequently Asked Questions about HTML). Said Amadou Ampaté Ba: « A calabash, filled, can't receive fresh water » (IOW: when you know all, you learn nothing more). Versailles, Mon 22 Jan 2007 11:02:45 +0100 |
||
|
|
|
|
|
#25 | ||||||
|
SitePoint Wizard
![]() ![]() ![]() ![]() ![]() ![]() ![]() Join Date: Nov 2004
Location: Åsnorrbodarna
Posts: 11,583
|
Quote:
Quote:
As I have said, I do not doubt that the software you use is buggy. But that is a problem with your software, not with the UTF-8 encoding. And since there are freely available alternatives that work, I don't see the problem. If you want the software vendors to fix the bugs so that you can continue to use the software you're used to, you shuld complain to them instead of publicly claiming that there is something inherently wrong with UTF-8. Contrary to what you seem to believe, I'm not trying to make you look bad. I do try to understand what you are saying, but it isn't always easy. That's why I ask if you mean what I think you mean. Quote:
Quote:
I'm saying that there are perfectly good editors that can handle UTF-8. I can't 'prove' that with a screen shot, because you'd have no guarantee that I hadn't doctored the image. Instead, I recommend you to download and try one of those free programs, and see for yourself. Quote:
Quote:
Your problems with OE and FP2006 may (I can't say for sure since I don't use either product) occur because the software doesn't support UTF-8. It may also occur because you are using a font that does not contain glyphs for all characters used. It may occur because the declared encoding does not match the encoding that was actually used. Or it may be something completely different. All I can say is that the problems do not occur because of an 'error' in UTF-8 that causes it to fail to encode an ISO 10646 character. If you understand what UTF-8 is and how it works, you'll understand that it cannot fail. I'd say that a person who thinks s/he knows everything has a great deal to learn. The more you learn, the more you realise you don't know. |
||||||
|
|
|
![]() |
| Bookmarks |
«
Previous Thread
|
Next Thread
»
| Thread Tools | |
| Display Modes | |
|
|
|
All times are GMT -7. The time now is 05:06.














Hybrid Mode
