The character encoding declared on your page specifies how a user agent is to interpret each character code. So if the code(s) for that 'ä' is according to the UTF-8 encoding of an 'ä', there's no problem.
As Thomas said in the post above yours, there's another factor that comes into play here: the 'accept-charset' attribute of the form. That should be set to UTF-8 as well, but you can't rely on that, because browsers may still send whatever they feel like. So to guarantee well-formed markup, you would need to check every character in the posted data to verify that it is encoded as a valid Unicode code point, according to the UTF-8 encoding!
Let's say I'm posting something through that form, and my browser sends the information encoded as ISO 8859-1. This means an 'ä' is encoded as 0xE4, which is not a valid code in UTF-8 (at least not by itself), and you're SOL.
You can either reject such invalid data, or you can try to guess the proper encoding and convert it into UTF-8 before displaying it. Either way, it requires some work.
A common problem is that with accept-charset="iso-8859-1", some browsers permit data encoded as Windows-1252 to be sent. Some code points in Windows-1252 are invalid in ISO 8859-1 (or, rather, they're invalid HTML with the ISO encoding).










(I don't have the Flash plug-in.)


Bookmarks