SitePoint Sponsor

User Tag List

Results 1 to 15 of 15
  1. #1
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Question Encoding Issue - Extra character

    I just read about UTF-8, and I thought it would be a wise decision to convert my site to it now, seeing as I am only working on the base framework for it. Im using Notepad++.

    I just chose to change the encoding of my files from ANSI to UTF-8. Everything looks exactly the same. Then I upload the files to my server using WinSCP 4.04 (what a version...). When I view the page in my browser (Opera 9.23), for every now UTF-8 file, and extra character is added "". This character appears as a tiny one pixel dot about 1/3 from the top. Each page requires a few includes, so there is a few of these dots.

    If I convert the files back to ANSI, and upload them again, the dots are gone. If i change one file, one dot goes. This character appears nowhere in the original document, even when i choose the "show all characters" in the view menu.

    So far, I have only found one solution. That being, rewriting the file from scratch using UTF-8, with no copy and pasting. I also noticed that I get a internal server error if my .htaccess file is UTF-8 and not ANSI.

    I'd really appreciate any help, as I don't really want to rewrite thousands of lines of code...

    Thanks,
    Brent.
    PHP | MySQL | (X)HTML | CSS

  2. #2
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    have you tried putting the following on the top of the main include page?
    PHP Code:
    <?
    header
    ("Content-Type: text/html; charset=utf-8");
    ?>
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  3. #3
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    And if that doesn't work, you can simply copy and paste the existing code into a brand new file, which has already been set to UTF-8.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  4. #4
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I added it to the top of the file that is included by each and every accessible file. Still no difference.

    http://www.socialnetwork.thedrainpeople.com/

    I tried doing that before, but I seemed to have the same problem. I shall try again now.
    PHP | MySQL | (X)HTML | CSS

  5. #5
    SitePoint Wizard stereofrog's Avatar
    Join Date
    Apr 2004
    Location
    germany
    Posts
    4,324
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    By default, NP++ adds BOM (http://en.wikipedia.org/wiki/Byte_Order_Mark) to the unicode files, try using "UTF-8 without BOM" option (in Format menu).

  6. #6
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    well spotted, stereofrog. I hadn't even seen that option

    Yup, that'll fix it.
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  7. #7
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hmm... It seems it is greyed out on every file that was not written using UTF-8 to start with... Maybe if I manually copy the contents of each file over to a file that is preset to UTF-8 it will work?
    PHP | MySQL | (X)HTML | CSS

  8. #8
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My mistake, it is only greyed out on files which are set to UTF-8. If they a set to ANSI, the option is available...
    PHP | MySQL | (X)HTML | CSS

  9. #9
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My Version: 4.2.2
    Latest Version: 4.3

    Im going to see if updating helps.
    PHP | MySQL | (X)HTML | CSS

  10. #10
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    version of what?
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  11. #11
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    NotePad++

    Upgrading made no difference...

    There is a bug. NotePad++ reads UTF-8 files without the BOM as ANSI. When you change any file to UTF-8, it greys out the UTF-8 without BOM option.

    When you create a new file without BOM and copy and paste and resave it, it works fine, fixes the problem. Just as soon as you open it again it wants to save it as ANSI...

    Maybe time to sadly look for a new editor...
    PHP | MySQL | (X)HTML | CSS

  12. #12
    Theoretical Physics Student bronze trophy Jake Arkinstall's Avatar
    Join Date
    May 2006
    Location
    Lancaster University, UK
    Posts
    7,062
    Mentioned
    2 Post(s)
    Tagged
    0 Thread(s)
    I'm using v4.2.2 and it's great, no problem. However, I only use ANSI

    So why do you want to use UTF-8? It's great for displaying content on HTML pages, but whats the difference with PHP code?
    Jake Arkinstall
    "Sometimes you don't need to reinvent the wheel;
    Sometimes its enough to make that wheel more rounded"-Molona

  13. #13
    . shoooo... silver trophy logic_earth's Avatar
    Join Date
    Oct 2005
    Location
    CA
    Posts
    9,013
    Mentioned
    8 Post(s)
    Tagged
    0 Thread(s)
    EditPlus <3 Puts all others to shame! (shameful plug)

    But hmmm an editor that doesn't work with UTF-8 without the BOM defient;y need a new editor.

    Really tho if you are using ANSI or ASCII is really no problem when switching between UTF-8 and back again but as you found out one little problem that is not part of ANSI is the BOM.

    PHP really hates the BOM cause it wants to be the first character on the first line of a file else you get a bunch of "Headers already sent" error messages.

    Now for a new editor I do recommend EditPlus since it is the one i use extensively it however is not free $30.

    Or if you are feeling brave vim, which i just started to learn and use (err try to learn and use.)
    Logic without the fatal effects.
    All code snippets are licensed under WTFPL.


  14. #14
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I plan to eventually launch my site globally... one day. If I create it all in UTF-8 now, there will be less problems later. It was just a matter of making it easier. If my PHP files and my template files are all UTF-8, there would (hopefully) be less to worry about.
    PHP | MySQL | (X)HTML | CSS

  15. #15
    SitePoint Guru brent5392's Avatar
    Join Date
    Dec 2005
    Location
    Australia
    Posts
    636
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Filed bug report
    PHP | MySQL | (X)HTML | CSS


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •