DOCTYPE and cyrillic

Hence the request for some page code.

The pagecode example in post #29 :slight_smile:

A reply from the help desk of the provider:


<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd">
<html lang="ru" xml:lang="ru" xmlns="http://www.w3.org/1999/xhtml">
<head>
<title>&#237;&#243;, &#208;&#238;&#241;&#241;&#232;&#255;</title>
</head>
<body>
<h1>&#1084;/h1>
<img src="logo.jpg" alt="logo" height="224" width="288" />
&#1056;&#1086;&#1089;&#1090;&#1086;&#1074;-&#1085;&#1072;-&#1044;&#1086;&#1085;&#1091;, &#1056;&#1086;&#1089;&#1089;&#1080;&#1103;
</body>
</html>

And yes this is a working page! I miss only the charset meta tag. Seems that problem in the saving of the file in utf-8. (not sure about that)

Maybe someone can tell me from scratch how to write a page in utf-8 with notepad++ 5.8

two ways i can do:
settings > preferences > new doc/default dir > encoding - select utf-8 w/o BOM, default language xml

second:
encoding > here is can select encode in utf-8 w/o BOM and also there is ‘cyrillic’ option. When I click ‘cyrillic’, I not see utf-8 anymore (only koi8-r, etc)

Writing the page code and text, next save as ‘name.html’ and upload with ftp to server.

Think this is how it has to work.

Not sure anymore :slight_smile:

Mathilde

you should use: default language Normal text. this options is for syntax colouring feat and helpers regarding to a specific language. but the file extension will help you there plenty. so you don’t need to specify a default language, the file extension will take care of that, detect the language.

using this method will ensure you always have utf-8 encoding for new docs, which is to say you won’t have to go every time to the encoding menu to change it.

using Normal text also means xml, which is simple text also.

the syntax colouring will be available using the extension for the new file: if you have something.xml, will be like you have Default language: xml.

so no need to put anything other than normal text as the default language, the file extension will help detect and set the syntax colouring feat or any other additional feat specific to that language.

because utf-8 it’s not a strictly cyrillic charset. you are accessing a strict cyrillic option in the menu, right :wink:

I was asked if the ftp program could somehow screw the charset up. So far as I know, it can’t. But someone could tell me I’m wrong.

It possibly could if the FTP Client setting was forced set to only deal with certain encodings like UTF-8… When I was looking at the Iceberg and reading my Cisco gear, I read something about that but it’s a rarity. I believe.

utf-8 is a “stream of bytes”. correct me if i’m wrong. ftp is a “file manager” for web, transferring byte-to-byte info, not codifications or whatever.

so, a file manager would change your avi file to an mp3 file upon copy?

the only way i think of, is if the byte transfer somehow gets mixed up. but the protocol stack takes care of that too.

hence, hardware malfunctions or viruses are the only reasons that could screw up your file transfer.

Off Topic:

Obviously you have to set the configuration in the first place it wouldn’t alter it on a whim. The FTP spec. says should the server support UTF-8 then they should use it by default - usually it’s to do with pathnames/filenames anyway rather than transmitted data. http://www.ietf.org/rfc/rfc2640.txt Though not all clients follow the specification anyway. I think the probability of this being the “issue” is negligible. :slight_smile:

What exactly are the HTTP headers saying on the file at the moment?

I noticed an error in the HTML sample above - not like it matters too much but: <h1>м/h1> should be; <h1</h1>

This is why I could do with the actual URL; as the code is rather insignificant so long as it contains the meta or higher level commands, i.e. correct HTTP Content-Type header.

I am at a disadvantage as I don’t know how the characters are supposed to render/display though…

I was asked if the ftp program could somehow screw the charset up. So far as I know, it can’t. But someone could tell me I’m wrong.

Stomme poes: This I understood when reading the last couple of days about all what can be causing any problems with utf-8 and cyrillic.

To play it safe I changed to WINSCP and FTP tool and simple to adjust to work with notepad++. Now all settings working well and seems problem solved.

I noticed an error in the HTML sample above - not like it matters too much but: <h1>м/h1> should be; <h1>м</h1>

xhtmlcoder: Totally right and just removed part of closing h1 tag when deleting text :slight_smile:

We will start working on pages now, and I will add the address later.

Again thanks a lot for help, information and tips.

Mathilde

Most ftp programs have two transfer modes, one for text and one for binary. To be safe use the “binary” format because it guarantees a bit for bit transfer, and bit for bit necessarily implies byte for byte.

Hello. If you can read any cyrillic letters and nothing of special symbols here
http://lumix.ru/promo/dmc-g2/
so

<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Transitional//EN” “http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd”>
<html xmlns=“http://www.w3.org/1999/xhtml”>
<head>
<meta http-equiv=“Content-Type” content=“text/html; charset=utf-8” />

will be quite enough. I’ve made this page in utf-only text editor(without cp1251) and it works well)

I wouldn’t recommmend you to use ko8-r because it got ouf of date almost 15 years ago)
If you expect only Russia-located visitors for your page use windows-1251 for any other cases use utf-8.

PhpRu: you didn’t tell screen readers and other software which language to start reading your page with.

The page also has the title first, before the statement of charset (ah! browsers will go back and start reloading the page after they hit that tag if it doens’t conflict with the server, so make it first!) and then a second tag repeating the title for some reason… why?

It is however an excellent example of why a site would want UTF-8… see all the non-cyrillic words and letters there? UTF-8 doesn’t care what you mix into the content. UTF-8 here was a better choice than something like koi8-r etc.

PhpRu: you didn’t tell screen readers and other software which language to start reading your page with.

Yep, because it’s a promo page, browsers-only))
anyway as u’ve said utf-8 supports any mixture of symbols and modern browsers should automatically recognize them. if you simply copypaste cyrillic symbols from any utf-8 document to your html page without any doctypes or headers, it shoul work correctly)
I use doctype special for ie, bеcause of it caprices )

with the condition that you’re “html page without any doctypes or headers” be a utf-8 document also. otherwise… only if its encoding supports the symbols “it shoul work correctly” :wink:

Wow, thanks guys, now I know how to make Russian pages o_O