Encoding Web Pages for UTF-8

How do I encode (??) my web pages so they support UTF-8 ?



First result.

<meta http-equiv=“Content-Type” content=“text/html; charset=UTF-8” />

As Ryan said, you can specify an encoding in the meta tag, but it is really your server that decides the page encoding, so you need to find out what encoding is being sent to the browser and change it if that’s not what you want. You can find out what encoding is being sent to the browser by the various dev tools supplied with each browser. The W3C HTML Validator will also indicate the server encoding.

Ryan who?

you can specify an encoding in the meta tag, but it is really your server that decides the page encoding, so you need to find out what encoding is being sent to the browser and change it if that’s not what you want. You can find out what encoding is being sent to the browser by the various dev tools supplied with each browser.

Would Firebug tell me that?

Can you be a little more specific how this works and what I need to check?

(I’m considering switching my web pages and database to UTF-8, but am starting to see that this is much more involved than I originally thought!!) :eek:



Only one other person has replied to this thread, called RyanReece. :shifty:

Would Firebug tell me that?

I don’t think so. But in Firefox, navigate to your site and go to View > Character encoding, and the server encoding of the current page should have a tick beside it.

Or just run the page through the validator and it will indicate the encoding too. It may be that your server is already set to UTF-8 anyhow, meaning you won’t need to do anything. But if you do need to, the page I linked to shows several ways to change the encoding, including putting a line of PHP at the top of your pages or just adding a line to a .htaccess file.

Off Topic:

Debbie put Ryan on ignore hence why she missed the answer to the question.


Debbie who? :shifty:[/ot]

Who is RyanReece?

This isn’t the first thread I’ve given the answer to, and due to her ignoring me, she doesn’t get the answer because noone else felt the need to respond in her thread after me, and thus the thread (in her eyes) died without “anyone” responding.

Off Topic:

Kind of makes you wonder why he insists on continuing to post in my threads… :rolleyes:


All of my php pages start off like this…

	// Initialize a session.

	// Set current Script Name.
	$_SESSION['returnToPage'] = $_SERVER['SCRIPT_NAME'];

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN"
<html xmlns="http://www.w3.org/1999/xhtml">

	<!-- HTML Metadata -->
	<meta http-equiv="Content-Type" content="text/html; charset=utf-8" />

Is it that simple, or do I need to do more?


Yes. You need to read the discussion and links above. :slight_smile:

By off chance you take me off that list, you should be able to see my helpful advice/answers :).

Also how you save the document. Save some document as iso-1252 and then have the server say “hey this is utf-8” == happy happy fun times.

They MUST agree with each other. The text editor who saves the document MUST save in the same encoding the server’s HTTP header states. The browser meta tag is just a cute extra

EXACTLY. If you save it from your editor as one type, then serve it from the server as another, bad things tend to happen with any characters outside the ASCII 7 set.

Though honestly, that’s part of why for English Language websites I only use ASCII7 and if fancy characters are needed, I use the named entities - because then it doesn’t matter what character encoding the server is trying to send; true ASCII (characters 0…127) are the same in most every character encoding… the only legitimate reason to need more than that being the stupid ‘styled quotes’, or foreign languages.

An easy way to make sure the server is sending UTF-8 is with .htaccess – though ideally the server software should be configured directly for it, you can’t guarantee all users will want (or even understand) it – so servers as a rule continue to default to iso-8859 globally and then let users declare it themselves.

<FilesMatch "\\.(htm|html|css|js|php)$">
   AddDefaultCharset UTF-8
   DefaultLanguage en-US

If you’re on an Apache or Apache compatible server, throw that on your .htaccess and you’re good to go… assuming you saved the file from your editor as UTF-8 as well. You can also declare it from PHP by outputting the proper ‘header’ before you start echoing out markup.

  header('text/html; charset=utf-8');

Notice that’s identical to what should be in your meta http-equiv=“Content-Type”… because that’s EXACTLY what http-equiv means. Both it and content-language are there so that should you be accessing the file directly or should the http headers be missing, the user agent can still make sense of things.

… and you want to save UTF-8 without the ‘byte order mark’ (BOM). A number of browsers (guess which ones) screw up if the BOM is present – a two character code at the start of the file to say how UTF-8 handles ‘long form’ characters. In Notepad2 it’s under file -> encoding; normal UTF-8 is labelled just “UTF-8”, while the BOM prefixed version is called “with signature”. Most editors let you set the encoding you are saving as in a similar manner.

She dropped the BOM on me… baby… She dropped the BOM on me…

The difference between the ASCII encoding and UTF-8 encoding is the same on how a shipping company for small, same size, anonymous little boxes, decides to drop the packages for its customers.

One package for one customers, that makes for ASCII. That is, if your customer is ‘a’, it will get one box, one encoded package (a small numeric value, that fits in a byte).

Now, the bigger customers, like ‘ă’, they can’t seem to fit their bigger stuff in just one little package, so it takes two or more little boxes, two or more bytes for them (bigger numeric values, that takes more bytes to store).

The problem the shipping company has to sort out is how to distinguish among these little packages to correctly give ‘a’ just one little package and to give ‘ă’ more than one. Also, more importantly, where do the little packages for one customer start and where do they stop, since all the packages look the same and customers have different numbers of them.

That problem has been sorted out by encoding, sorting out the packages.
The same way we separate the luggage at the airport: every traveler chooses how many suitcases are his. That is, how many he puts in it at the start of the journey must be equal to how many he gets at the end.

A sort of putting stops in the flow of boxes. Each encoding has different ways to put those stops in the bytes flow, and each encoding can handle specific luggage sizes: one suitcases for one, three suitcases for another.

Another parallel: cars. Let’s say that ASCII only handles minis while UTF-8 can also handle up to lorries.

Another parallel: if the server and the browser where in a water gun fight, they’d talk in squirts. A little squirt for ‘a’, a longer squirt for ‘ă’. When the water gun is empty, that’s when it’s the end of file.

Now, what poes and Jason are trying to say, is that when it comes to files you create, you’re actually the head of that shipping company, and you have to make all the decisions. You get to decide what the encoding is, hands on, by specifying this option for the files you create. I repeat, for the files you create.

How do you do that? The same way you take control and specify the formats for the content in a word processor: using the options the editor of your choice gives you. You just have to know where to go to set the font as Arial instead of Times New Roman, meaning specifying UTF-8 instead of ASCII.

For example:

If you use Notepad++, you have the Encoding entry in the menu bar. If you use other editors and you can’t find the encoding option, I’m sure we can help you.

I thought you needed a little more insight on what’s the encoding about:

  • files are streams of little anonymous small, same size boxes (bytes)
  • interpreting a stream of bytes: a way to select how many boxes (bytes) belong to a character, how many bytes it takes to hold the numeric value (encoding) of a character, and where those boxes start and where those boxes end.

Finally, the part about being kind and letting the browser know you’re sending it a file, a stream of bytes, you’ve decided to encode as UTF-8, that is covered by that meta declaration. But if you didn’t actually took care on your part to create a file where you knowingly have the UTF-8 encoding set, that remains just a false declaration.

So, to answer to your question, no, it’s not that simple, and yes, you need to do more.


I’m not sure why he is on your ignore list to begin with, but he is a very knowledgeable guy that provides useful information. I’m sure you’d benefit from taking him off your block/ignore list.[/ot]

Creative way to put it!


Due to that thread.