DOCTYPE and cyrillic

Question, everyone, is it “ru” or is it “cp”? I used to have a page that listed all the two-letter language combos but I can’t find it.

It is ru Русский (Russian). As for abbreviations like ‘a11y’ I am not as cool as the p03$ this is embarrassing…

Off Topic:

I am going to go ballistic regarding the sl-ooow speed my net connection today.

Some puzzle moments later, and changing several in coding, there seems some conflict with the server options and the coding in the head of the page (<meta http-equiv=“content-type” content=“text/html; charset=utf-8” />)

First the options to select from server side:

Web-server encoding windows-1251
koi8-r
x-mac-cyrillic
disabled

Files encoding on physical disk windows-1251
koi8-r
x-mac-cyrillic
disabled

When selecting koi8-r + including


<meta http-equiv="Content-Type" content="text/html; charset=koi-8">

there is some cyrillic but not the words in the xhtml file.

changing to utf-8 is not helping as well.

windows-1251 selected and windows-1251 in meta http-equiv tag, is also not working. At this moment is selected ‘disabled’ and i removed the meta tag.

I can pm the page (not want to have site mentioned online at this moment - google will index)

Thanks for the reply.
Mathilde

go ahead :slight_smile:

you have

<?xml version=“1.0” encoding=“UTF-8”?>

at the top. drop it. you don’t need it anyway. also no whitespace before

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>

<img src=“…> should be <img src=”… />, you use xhtml.

the lines displayed match the lines written in xhtml document: 9.

i see all the info:
rostov na dony
rejim raboty
rejim raboty v suboty
posetite nash ofer prodaj po adresu
Prospect Stachky

are you sure you can read cyrillic? :slight_smile:

oh, and definitely lose the <center> !

My sites are still on transitional because that’s the MS default in Visual Studio, and I hate making repeated trips through my website just checking to make sure I’ve always remembered to swap it out. Why MS fell in love with it though, I’ve no idea.

:slight_smile: Interesting, I’ve heard it in Russian and French, and it was… not easy to follow even though it was reading English text which is my first language. Oh… and the price?? Amen! Can’t believe how expensive those things are.

I hadn’t heard that before. I thought 8 was the first to “play nicely” with the XML tag. Cool. Well, maybe I’ll throw it into my template when I ditch support for 6… someday.

I know where you can find almost all of them, but you’re going to kick yourself. :wink:

Go to the top of this page. Click Reference on the navbar. Change to the HTML tab, and in the second row, middle column under “HTML Extras” is a link to the ISO 2-letter language codes. I think I found a page that had half a dozen more and the LOC is promoting a new three letter standard (ISO-639-2), but but in 99.99% of production websites, the Sitepoint reference has everything you need ~120-130 countries.

Hi noonnope,

Ok, maybe not clear. The page as it is now you can read the cyrillic as supposed. When using the meta it is not.

At this moment I see:


<meta http-equiv="content-type" content="text/html; charset=windows-1251/>

<center> used for the moment and will be in final index page in css

hope changes now right

mathilde

ok, i guess i understand now.

you expect to write your page in koi8-r and display the same when you choose to send it as windows-1251.

i’m afraid that’s not possible. you have to send the page with the same encoding you use when writing it, because the coding for the same index in the code page for two different char sets may result in different characters all together. like when L in western windows-1252 english char set is L but in the russian char set koir-8 the same index place has л and the windows-1251 cyrillic char set has М.

you can’t put ruble in your account and then wire them as dollars.

you send them as ruble also, or you make a conversion first, from ruble to dollars, if you want to send dollars :slight_smile:

so, if you write the page in koi8-r, you need to send it as koi8-r encoded. if you want to send it as windows-1251, then you need to remake/retype/convert your page using this encoding.

this whole mess is the reason utf-8 exists :wink:

I totally lost at the moment. But where I lost.

When having a blanc document in Notepad++ I start with making the xhtml coding:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ru" xml:lang="ru" xmlns="http://www.w3.org/1999/xhtml">
<head>
<meta http-equiv="content-type" content="text/html; charset=koi8-r" />
</head>
<body>
</body>
</html>

In this example I used charset=koi8-r! Reason is that in my servercontrol panel I selected the koi8-r option.

Web-server encoding windows-1251
koi8-r
x-mac-cyrillic
disabled

I am not sure what to do with this second one. Do I have to select here an option as well, or let it disabled?

Files encoding on physical disk windows-1251
koi8-r
x-mac-cyrillic
disabled

Now when I read the different replies, I have to save the file as a utf-8 w/o BOM.

In 3DFTP I upload the file to the server, and change the name into .html

Only not really working for me :frowning: Even doing same but with charset=windows-1252 or charset=utf-8 [and do the change for the web server encoding as mentioned above]

Now when I read the different replies, I have to save the file as a utf-8 w/o BOM.

IF you save as UTF-8, don’t let your text editor add a BOM. Sometimes you can even see it as an option in your editor: but you never want it on a web page.

The most important thing: everyone says the same charset, whichever one you choose. You save the document in Notepad++ in the same charset you write in the meta tag which is the same charset as what the server is putting out.

Any time someone says a different charset than the rest, you’ll get ?'s or problems.

Stommepoes: My point is that I can select koi8-r, windows-1251, x-mac-cyrillic or just disable for webserver-encoding.

In notepad++ I only can select > coding > ANSI or UTF-8 w/o BOM and UTF-8

What is best to do? I can understand that you have to use same coding/charset to let them show page, but not see how to do here.

Thanks for the reply.
Mathilde

Since you seem to be having issues with the server you may find it easier to use .htaccess if it is Apache: http://www.w3.org/International/questions/qa-htaccess-charset

Useful: http://www.w3.org/TR/i18n-html-tech-char/

Else I’d probably forget the server and try and force encoding via the document even though it probably wouldn’t work unless the browser over-road it.

Albeit if you cannot type text directly in Russian via Notepad++ you are probably going to have to resort to UTF else find find another editor. If you use ANSI you cannot directly type in Russian and expect the correct results.

I assume the browser you are using is also correctly configured with the appropriate language sets and encoding, etc.

Although I believe SitePoint don’t generally like people using PM to solve publicly posted topics; if you are still having issues I am sure they won’t mind, if you send the me a PM with the URL. :slight_smile:

If your only fear was just Google you could have typed something like: www. mywebsite .com /index.htm and it wouldn’t have auto-formatted your address - there are other-ways too but that’s the easiest.

Found my old bookmark:
http://www.iana.org/assignments/language-subtag-registry
that’s what I was looking for. Kinda overkill tho : )

Stommepoes: My point is that I can select koi8-r, windows-1251, x-mac-cyrillic or just disable for webserver-encoding.

Something seriously outdated on a server who cannot offer utf-8 : ( I mean, I hope you’re not using a Windows server running IIS!?

and you can do the same in notepad++

not true. in notepad++ v5.7, you have the following menu path:
Encoding

[INDENT]->Character set

[INDENT]->Cyrillic[/INDENT][/INDENT] where you have listed seven charsets to choose from to match your server encoding.

once again:
if you write your pages in notepad++ with koi8-r encoding, you HAVE to send them with koi8-r encoding and you HAVE to use koi8-r in your meta. koi8-r all over. you cannot write your pages in koi8-r and send them windows-1251 as you can’t send ruble as dollar.

and you can do the same in notepad++

though it’s a cheap server service if the server can’t be set to utf-8. I’d switch hosts.

I would agree. Check with your web host, there is almost certainly utf-8 available, though they may have to do it on their end. If they truly can’t do utf-8 the issue isn’t even the charset (though certainly that’s a big one), the real issue is what kind of junky servers are so old they can’t do utf-8?? That’d be time to find someone else to host your website.

Again thanks to all for the information and help. Will be busy with it during the weekend.

Here are links fitting the subject as well.

http://www.marsandmc.nl/internet/a001-unicode.html

About the hosting server I had contact and it is possible in utf-8, only I have to select disable in the earlier mentioned options and use charset=utf-8.

Will download latest version of notepad++.

Mathilde

let us now how it goes :slight_smile:

You could post the html instead. This is quite an interessting subject, but everybody is in the dark about where you and noonnope are talking about

if you have things to add/ask you’re welcome, we are open to any new views/questions.

but the OP made it clear she cannot disclose the link to the site in the open yet :slight_smile: and frankly, it’s not that relevant.