SitePoint Sponsor

User Tag List

Results 1 to 6 of 6
  1. #1
    SitePoint Guru cyjetsu's Avatar
    Join Date
    May 2008
    Posts
    814
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    character encoding choice in files

    I know about char declaration in pages. I am asking about actual saving of files in text editors. I just noticed that my css editor is saving as "ANSI." I have not been delcaring any charset in my css, should I declare utf-8 and choose utf-8 for saving css files? The 3rd choice on editpad plus for saving files is in unicode.

    I am using topstyle pro to edit and save html files and I am declaring utf-8 in html files. But I just noticed that there are no options in topstyle for saving different coding formats and I have actually no idea what it is saving my html files as. So I am guessing if it is not saving in utf-8 and I am declaring utf-8 then it might be a problem. Is there a way to detect what coding a saved file is and to convert it to utf-8?

  2. #2
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    It is imperative that the encoding you declare for your page is exactly the same as the encoding you use to save your file. Browsers don't magically 'transform' anything.

    It's like if you declare the natural language; if you write in French, you should declare the language as French, not English or anything else.

    If your editor saves as 'ANSI' (which I assume means ISO 8859-1), then you must declare the encoding as ISO 8859-1. That means in the Content-Type HTTP header sent by your server, as well as any meta element within the document.

    If you declare the encoding as UTF-8 and save the file as ISO 8859-1, there will be major problems (unless you only use ASCII characters). For an XML document (including proper XHTML), this will cause well-formedness errors that force the parser to abort.
    Birnam wood is come to Dunsinane

  3. #3
    SitePoint Guru cyjetsu's Avatar
    Join Date
    May 2008
    Posts
    814
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    So Ansi is ISO, I didn't know that. I also didn't know about the header sent by my server. I only declared in the meta tags of each document before. I shall save files and declare correctly now.

    But I still have the problem of not knowing what encoding some txt/html editors save files as.

    Is there a way to detect what coding a saved file is and to convert it to utf-8?

  4. #4
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by cyjetsu View Post
    So Ansi is ISO, I didn't know that.
    I don't know for sure either, so don't trust me on that part.

    Quote Originally Posted by cyjetsu View Post
    I also didn't know about the header sent by my server. I only declared in the meta tags of each document before. I shall save files and declare correctly now.
    I wrote an article about character encoding a while back, that may be of interest to you.

    Quote Originally Posted by cyjetsu View Post
    But I still have the problem of not knowing what encoding some txt/html editors save files as.

    Is there a way to detect what coding a saved file is and to convert it to utf-8?
    That is a problem. As long as you only use ASCII characters, there's no way to detect the encoding (nor is there any real need). It can also be very difficult to detect if something uses ISO 8859-1 or Windows-1252, since they are quite similar (but not identical).

    For Windows editors (in Western countries) you can usually assume either Windows-1252 or ISO 8859-1.
    Birnam wood is come to Dunsinane

  5. #5
    SitePoint Guru cyjetsu's Avatar
    Join Date
    May 2008
    Posts
    814
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    thanks. i will just copy and paste my html into notepad as at least notepad can save as utf-8, or i will probably find a better editor that also can save in utf-8. pretty stupid that a lot of editors dont have different saving options and wont even tell you what charset you are saving in.

  6. #6
    SitePoint Author silver trophybronze trophy

    Join Date
    Nov 2004
    Location
    Ankh-Morpork
    Posts
    12,158
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Yes, Notepad can save as UTF-8. Unfortunately it always saves UTF-8 with a BOM, which is neither necessary nor desirable.
    Birnam wood is come to Dunsinane


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •