Weird looking characters

How do you do a character set that will display all characters exactly as they should be displayed please? For example, I have server set to UTF-8 and in the pages meta tags set to , but, for example, on a word like Leganés the e comes out as a black diamond with a question mark in it?

That sounds like the editor you used didn’t save the file in the UTF-8 character set.

Try load it in your editor and check what char set it has. Then if you change it to save in UTF-8 also check that the editor does not add a BOM (Byte Order Mark) in the file start. (And a BOM would corrupt a css file for example.)

4 Likes

Thanks Erik, the editing program (Frontpage2003) is already set to what you see here, but how do I do: ‘also check that the editor does not add a BOM (Byte Order Mark) in the file start’ ?

If the editor does’nt have the option to not add BOM or Identifying bytes in utf-8 coding you can load the saved file in a hex editor and check that the first three bytes is not e.g. EFBBBF but e.g. 3C2144 for a document that starts with <!DOCTYPE.

Then you know if it does add a BOM. E.g. the old Windows Notepad does add it when you save in UTF-8. (choose ASCII in that case, it is close to ISO 8859-1 for the first 255 characters.)

I’ve got a little confused on that. :frowning: I don’t use notepads and I upload via filezilla.

In FileZilla check the transfer settings:

In “Site Manager” open tab “Char set” and: normally choose “Autodetect”, or in this case for testing you can choose “Force UTF-8”.

EDIT)
Come to think of: It can also happen that your browser or whatever you look at the page in is set to default use another char set like “Western”.

Strange, when setting the php.ini to UTF-8 the pound (£) sign doesn’t display properly, but when setting it to iso-8859-1 it does?!? I thought that UTF-8 was supposed to display everything exactly as it’s supposed to be displayed?

That’s the proof that says the page is coded in ISO 8859-1 which is the Western standard char set. I would say it is your editor that saves in that char set, and the ftp and server doesn’t change documents char set, only serves them as if they were UTF-8.

Best advice if you continue with Fronpage is to set it to ISO 8859-1 or Western, I suspect the code base setting you refered to only affects the meta-tag naming the char set and not the file saving.

I don’t think you really need the UTF-8 char set, the ISO 8859-1 is as generally accepted for web site textfiles as html and css and javascript (and emails). IMHO it could even be better as I seen many browsers having the Western option set as the default expected char set for web pages.

and yet, w3 say to always use UTF-8

That is when coding and saving the pages. Of course then serve them as the encoding they actually are saved with.

Come to think of the “Site Settings” screenshot you posted; Check your editor’s settings again, perhaps the option “Ignore the keyboard when deciding the encoding of new pages” should be checked for the UTF-8 to be applied.

Okay, have now had a chance to delve deeper into this and when testing by putting a £ sign into a php file, it comes out okay, but if that file is including another php file to be included and within that include file, a test £ sign doesn’t appear okay. ?

I assume you checked that both the including and the included had the same encoding. Did you have a test sign in both files and the included displayed the sign different?

That seems odd.

The included php file just has text, nothing else, no code there at all…

All text files uses characters for text and those characters, say the £ sign, can be in different places in the “alphabet” in different character sets. Depending what character set the file is “encoded” in the interpretation of the byte that represents the sign/letter will differ if it is decoded with the wrong alphabet. Usually the common English characters has the same position in the beginning of the character set, it’s the international letters and other languages that is encoded with different bytes, like the £ sign is.

Compare it with a keyboard layout; different languages has different positions for the special keys.

I suggest you check that all tested files are edited and saved with the same editor with the same character setting. If the display is still faulty with UTF-8, then I don’t know, perhaps try resave them in ISO 8859-1 and check the result using the same char set in ftp and server settings.

It seems to be something to do with the charset on how the php include is included, am still searching around for the right way.

I’ve found (the hard way, experience can be a harsh but good teacher) that consistency is the key. I prefer utf-8 w/o BOM

  • the text needs to be created in a text editor with the settings set to the charset.
  • files need to be saved in the charset
  • files need to be uploaded in the charset
  • any user input should be checked and if need be converted to the charset
  • the database needs to be set to the charset.
  • the HTML has to declare the charset
  • the browser needs to be set to the charset.

Yes, it’s a lot, but once you get in the habit it comes more naturally.

2 Likes

Yep, they all seem to be done correctly, it’s the (php) include part where it’s going wrong?

I experienced a similar problem to Dez’, but it never occurred to me to change the “charset” settings. Instead, I replaced all the unwanted symbols with HTML character codes that begin with an ampersand (&) and ends with a semicolon (; ). For example:

// Type &deg;  if you want a degrees symbol.
// Type &eacute;  for "e" with an accent.
// Type &amp;  to replicate an ampersand.
// Code is also available for the &euro; and &pound; symbols.

If you’re using a software program (like Dreamweaver) to edit your HTML page, it might have a complete list of these characters, accompanied by their respective codes, stored in memory. Sometimes you can access these codes by simply starting to type them, though my own software doesn’t do any of the typing for me. It simply provides me with a reference that I can copy into my HTML file.

The codes work in all browsers, so you don’t have to worry about compatibility issues, although if you find it easier to alter the “charset” parameters…

2 Likes

BestWeb is right. You can change individual characters by looking up a character code table. Two of my faves are here:

http://symbolcodes.tlt.psu.edu/web/codehtml.html

So the pound symbol is the ampersand symbol followed by pound; – a simple one to remember!
&pound;

1 Like

But it is more of a “band-aid” solution to the problem. The correct way is constant character encoding throughout.
That said, some characters such as the ampersand, greater-than, etc, should always be encoded as they have special meanings in html. But for general things like accented characters, there should be no need.

3 Likes