HTML
HTML Character Encodings
Character encoding ensures web content displays correctly on all devices. In an HTML document, you define text representation with character encoding. This conversion maps bytes to readable characters for browsers.
In this tutorial, you will learn how HTML character encodings work, why they matter, and how to use them.
Learning Outcomes
After reading this tutorial on HTML Character Encodings, you will be able to:
- Understand the role of character encoding in ensuring proper text display.
- Identify and compare common encodings like ASCII, ANSI, ISO-8859-1, and UTF-8.
- Implement the correct charset in your HTML using the
<meta charset>
tag. - Choose the right encoding for your website to support multiple languages and symbols.
HTML Charset Attribute
Use the charset attribute in a <meta>
tag to set the document’s character encoding. This tells the browser how to read your text.
<meta charset="UTF-8">
HTML5 uses this format. UTF-8 covers almost every character and symbol. It works well for sites in multiple languages.
The ASCII Character Set
ASCII (American Standard Code for Information Interchange) was the first widely used character encoding system. It contains 128 characters, including:
- English alphabets (A-Z, a-z)
- Digits (0-9)
- Special characters like
!
,@
,#
,$
,%
Example:
<meta charset="ASCII">
Notes:
- ASCII is limited to English characters and a few special symbols.
- It is not suitable for non-English languages or special symbols used globally.
- ASCII forms the basis for many later character sets.
The ANSI Character Set
ANSI (Windows-1252) is a character encoding primarily used in Windows operating systems. It extends ASCII by adding additional characters (from 128 to 159) for use in Western European languages.
Example in HTML code:
<meta charset="Windows-1252">
Notes:
- ANSI includes characters for Western European languages like accented letters (e.g.,
é
,ç
). - It is similar to ASCII but extends the character set to better support European languages.
- While more versatile than ASCII, it still lacks support for languages outside the Western European region.
The ISO-8859-1 Character Set
ISO-8859-1, also known as Latin-1, was the default character set for HTML4. It includes 256 characters, covering Western European languages, including accented characters, punctuation, and symbols.
Example in HTML code:
<meta charset="ISO-8859-1">
Notes:
- ISO-8859-1 is very similar to ANSI but provides more complete coverage of Western European languages.
- It does not support characters outside the Latin alphabet (e.g., Chinese, Arabic, or Cyrillic).
- This encoding was widely used before UTF-8 became the default.
The UTF-8 Character Set
UTF-8 is part of the Unicode standard and is the most widely used character encoding today. It supports virtually every character from every language in the world, as well as a wide array of symbols and emojis.
Example in HTML code:
<meta charset="UTF-8">
Notes:
- UTF-8 is backward compatible with ASCII, meaning it supports all ASCII characters.
- It is flexible and supports characters from multiple languages, making it ideal for international websites.
- It uses a variable-length encoding scheme (1 to 4 bytes), allowing efficient storage for characters.
Differences Between Character Sets
To understand the differences more clearly, refer to the table below. It compares how characters are encoded in ASCII, ANSI, ISO-8859, and UTF-8.
Number | ASCII | ANSI | ISO-8859-1 | UTF-8 | Description |
---|---|---|---|---|---|
32 | space | space | space | space | Space character |
33 | ! | ! | ! | ! | Exclamation mark |
34 | " | " | " | " | Quotation mark |
36 | $ | $ | $ | $ | Dollar sign |
37 | % | % | % | % | Percent sign |
42 | * | * | * | * | Asterisk |
48 | 0 | 0 | 0 | 0 | Digit zero |
49 | 1 | 1 | 1 | 1 | Digit one |
65 | A | A | A | A | Uppercase A |
97 | a | a | a | a | Lowercase a |
128 | Ç | Ç | Ç | Latin capital C with cedilla (ANSI/ISO-8859-1) | |
129 | ü | ü | ü | Latin small u with diaeresis (ANSI/ISO-8859-1) | |
160 | € | € | € | Euro sign (non-ASCII) | |
195 | ñ | ñ | ñ | Latin small n with tilde (non-ASCII) | |
220 | Ü | Ü | Latin capital U with diaeresis (ISO-8859-1/UTF-8) | ||
226 | ñ | ñ | Latin small n with tilde (UTF-8/ISO-8859-1) | ||
255 | ÿ | ÿ | ÿ | Latin small y with diaeresis (ANSI/ISO-8859-1) | |
128-159 | Not used | Extended characters (special symbols) | Extended characters (special symbols) | Extended characters (special symbols) | Reserved for extended symbols, such as currency signs, control characters, etc. |
224-255 | Not used | Identical to UTF-8 for characters above 127 | Identical to UTF-8 for characters above 127 | Supports all extended characters | Range for accented characters, punctuation, and special symbols in European languages |
256+ | Not available | Not available | Not available | Supports characters from various non-Latin scripts (Chinese, Arabic, Cyrillic, etc.) | UTF-8 can handle characters from almost every language in the world |
Control characters (codes 0–31) are non-printable. Early systems used them to control hardware functions such as moving the cursor, ringing a bell, or starting and stopping data transmission. Devices and software use these codes for communication and formatting.
Why Character Encoding is Important?
Character encoding ensures browsers render your content correctly.
- Text and symbols display as intended.
- Foreign characters appear correctly instead of
�
or gibberish. - Consistent encoding prevents display errors across browsers and devices.
Right encoding makes your site work for all users and all browsers.
FAQs on HTML Character Encodings
What character encoding does HTML use?
By default, HTML uses UTF-8 encoding, which supports a wide range of characters from multiple languages and symbols.
Is charset UTF-8 necessary?
Yes, UTF-8 is highly recommended for modern websites because it supports all characters in the Unicode standard and ensures proper display across languages and platforms. It has broad compatibility and works for most web applications.
How to define character set in HTML?
You can define the character set by adding the following meta tag in the <head>
section of your HTML document:
<meta charset="UTF-8">
How to specify encoding in HTML?
To specify a different encoding, replace "UTF-8" with the desired encoding in the <meta>
tag. For example, to use ISO-8859-1, use:
<meta charset="ISO-8859-1">