Allowing all characters but HTML display

Hey,

I’ve been seeing a trend in upside down text and I’m trying to allow it in my script but it displays with a bunch of question marks.

This is the text


ʇnoqɐ ƃuıʞuıɥʇ ʇnoɥʇıʍ ʎɐp ɐ oƃ ʇ,uɐɔ noʎ ƃuıɥʇǝɯos uo dn ǝʌıƃ ɹǝʌǝu

It seems to display properly on the forums this is what I’m trying to achieve.

I’m wrapping all user input with htmlentities()

What encoding are you stating for the page (this page states charset=ISO-8859-1), the characters are standard ASCII AFAIK.

Utf-8…

I tried ISO-8859-1 but same results.

Too few info to help.
Do you use any database? Have set up it’s encoding properly?

No it’s not from a database. I’m copying and pasting the above text I provided into my html document and I thought it was an htmlentities() issue but it looks like its a browser issue.

I’m not sure how they accomplish that here.

How do these question marks look? did you try another browser?
what is in the page source? if these marks, then it’s htmlentities issue.
use of htmlentities is completely unnecessary if utf-8 in use.

The reason why I’m using htmlentities is to help stop xss attacks.

This is the output:


?noq? ?ui?ui?? ?no??i? ??p ? o? ?,u?? no? ?ui????os uo dn ??i? ????u 

I tried and it looks the same in Firefox and IE 8

Source:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">

<head>
	<meta http-equiv="content-type" content="text/html; charset=UTF-8" />
	<meta name="author" content="" />

	<title>Untitled 1</title>
</head>

<body>

&#647;noq&#592; &#387;u&#305;&#670;u&#305;&#613;&#647; &#647;no&#613;&#647;&#305;&#653; &#654;&#592;p &#592; o&#387; &#647;,u&#592;&#596; no&#654; &#387;u&#305;&#613;&#647;&#477;&#623;os uo dn &#477;&#652;&#305;&#387; &#633;&#477;&#652;&#477;u

</body>
</html>

You’ll need to make sure the file you are copying from is utf-8 too, just use a reasonable IDE and you can probably face-dance with the encoding.

Try NetBeans if you don’t have an IDE, as long as you don’t mind Java on your machine.

I keep forgetting about this issue of the source file, and only just re-discovered it myself after a lost couple of hours last week.

I’m not sure I’m understanding you correctly Cups.

I’m currently using phpDesigner as my IDE.

If the source of that ASCII came from a iso-8859-1 charset shouldn’t it fix itself if I change my charset to iso-8859-1 as well?

Sorry, if iso-8859-1 is what you want and your IDE allows you to set the coding for your file, then you have probably eliminated the source file as being the cause of your woes.

That was all I meant. I thought I’d flag it because even though I knew about this, it slipped my mind recently - as I usually head straight for the database when odd characters appear on-screen.

No matter which encoding you have decided to use, you have to look at it as being a “cradle to grave” exercise, every place those chars are touched, or displayed double check the char encoding settings.

Kore Nordman has one of the best resources for PHP char encoding and i18n issues.

PHP Charset FAQ and [URL=“http://kore-nordmann.de/blog/0082_charset_versus_encoding.html”]Charsets vs Encoding