Special Entities of HTML

Tweet

When you type text into any reasonably modern word processing program, even though your keyboard key shows that ubiquitous ASCII double quote symbol, you see nice “curly” opening and closing punctuation marks when you hit it.

These special quotes can’t be found on your keyboard. But word processing programs understand that when you put something in quotes, you want nice left and right quotes, and it replaces the characters you typed in with the correct ones. The same goes with apostrophes. Have you ever seen an ASCII apostrophe like the one on your keyboard in a book or brochure? Of course not. What we usually see in printed material is a closing single quote. In fact, there exists a vast array of characters that aren’t represented on a standard keyboard, though these characters show up on web pages and in printed material.

Now, that’s all well and good for people using word processors. But for those of us typing text into an HTML document, there’s no system to automatically replace the characters from our keyboards with their grammatically correct equivalents. Depending on which type of character encoding your web site uses, when you paste these characters directly into an HTML document, you may see a bunch of gibberish on the rendered page. Also, the inclusion in text of characters that are used by HTML, like < and >, will wreak havoc in your page, as they cause the beginning or ending of HTML code.

For these reasons, a series of special codes or entities has been created—we type these into our HTML documents to produce correct punctuation marks and just about any special character that we could need. The examples in the table below are just a sample of the many HTML character codes that exist.

The code on the far left is known as an entity name or keyword. For instance, to produce a copyright symbol in your document, enter copy directly into your HTML; you’ll see a © in the rendered page. Each of these entities also has a numerical equivalent; the numerical equivalent of copy is #169 which produces the same symbol.

Sample list of HTML character entity references

Entity Character Description
&lt; < Less than
&gt; > Greater than
&amp; & Ampersand
&lsquo; Left single quote
&rsquo; Right single quote
&ldquo; Left double quote
&rdquo; Right double quote
&laquo; « Left angle quote
&raquo; » Right angle quote
&reg; ® Registered trademark
&trade; Trademark
&copy; © Copyright
&cent; ¢ Cent
&pound; £ Pound
&euro; Euro
&yen; ¥ Yen
&frac14; ¼ One quarter
&frac12; ½ One half
&frac34; ¾ Three quarters

For a more complete list of codes and their alternative entity numbers, check out W3Schools’ HTML Entities page.

We’ve all had adventures using special characters online. What pitfalls have you fallen prey to? And what advice can you give to help others avoid them?

Free JavaScript: Novice to Ninja Sample

Get a free 32-page chapter of JavaScript: Novice to Ninja and receive updates on exclusive offers from SitePoint.

  • Martin

    Nice Entity is also a nice resource for these: http://nice-entity.com/

    • http://www.eastdevonit.co.uk Dan Web Designer

      Great link, thanks!

  • Anonymous

    You should always be using UTF8 encoding these days. This will allow your website to be localised into languages other than English. It is not necessary to use special entities when you use UTF8 encoding.

    • http://www.itmitica.com/en IT Mitică

      If you’re referring to specific language chars, like Romanian ăîâşţĂÎÂŞŢ, then yes, it’s not necessary to use special entities with utf-8.

      I only need to install the keyboard for the language and learn the differences between what’s written on the keys and what’s actually being typed when switching among keyboard layouts.

      But for things like copyright, euro and especially for others like ampersand, I have to use special entities.

      • Anonymous

        Actually its perfectly fine to type (or copy and paste) copyright, euro and other symbols directly with a UTF8 encoding. If your document is HTML (not XHTML), it is also fine to use ampersands – as long as they are followed immediately by a space.

        You can also use all sorts of other UTF8 symbols if you desire. ☻ ☺ ★ ✿ ☮ ← ↑ → ↓ © ™ € ±x² ≤ ½  ► ♪ ♫ ◆ ♂ ♀

        So save yourself the time. Don’t worry about character entities (other than ). Just make sure your page is UTF8. ☺

        • http://www.itmitica.com/en IT Mitică

          Thanks for the advice.

          But I’ll stick with what I know from practice :)

          PS Those aren’t UTF-8 symbols. UTF-8 is multibyte character encoding for the actual Unicode character set.

          • http://blog.avangelistdesign.com Andy Parker

            The character encoding is a little moot, you shouldn’t be so naive to assume that UTF-8 will mean you can have it in any language either, that is a clear sign of somebody who has never had to build an app that is multilingual – the charset is the tip of the iceberg.

            Interestingly, your statement regarding font faces using the same encoding order highlights why it simply isn’t work wasting time trying to create ‘pretty’ quotation marks or other grammatical elements.

            It’ll be interesting to see how new font services like font-deck and typekit work with these characters and whether they are more embellished.

          • http://www.itmitica.com/en IT Mitică

            “Are you talking to me?”

            I’m a little confuse whether you’re directing the “naive” and “never multilingual” part towards me :)

            Again, Andy, from the top, as it seems you are too confusing the charset with the encoding used for it:

            – Unicode is the actual set of characters
            – In any charset, a char is represented by a number
            – A number is represented by a byte or by a number of bytes
            – Conclusion: a string of chars =  a byte stream
            – There is more than one way to interpret a byte stream
            – UTF-8 is just an algorithm used to decode the byte stream that’s being received
            – Hence the need to specify UTF-8
            – It all boils down to the font face you’re using: if it’s Unicode compatible, if it has the chars you’re aiming for etcetera
            – Otherwise, the results can be unpredictable

            Cheers

          • http://blog.avangelistdesign.com Andy Parker

            I wasn’t directing anything at you no, to the group.

            You will always need to use entities for symbols of course, you’re absolutely right in what you’re saying. I got caught up on the usage of quotation marks and started thinking of other grammatical entities that are let down by font rendering.

            With regards to the statements on UTF-8 I was highlighting that it isn’t a magic pill that suddenly means you have a site that can be multilingual as there is far more that goes into it and also that sometimes there is good reason to use region specific ISO decelerations.

            Not disagreeing with you, IT Mitică.

  • Joe

    Here’s another great resource for entity look up.

    http://leftlogic.com/projects/entity-lookup/

  • http://www.facebook.com/xzyfer Michael Mifsud

    Love your work guys. However I think you’d be doing the web development community a massive favour by not linking to w3school. Please consult http://w3fools.com/ for many reasons why we, as a community, should refrain from promoting w3school. Great work otherwise :)

  • seema

     nice….!

  • http://newevolutiondesigns.com Tom, NewEvolution

    Useful.

  • http://www.mactonweb.com web design california

    This is a great post. This is very interesting and helpful too. This will be very useful. And a nice blog. Thanks for this wonderful post.

  • http://www.itmitica.com/en IT Mitică

    “Are you talking to me?”

    I’m a little confuse whether you’re directing the “naive” and “never multilingual” part towards me :)

    Again, Andy, from the top, as it seems you are too confusing the charset with the encoding used for it:

    – Unicode is the actual set of characters
    – In any charset, a char is represented by a number
    – A number is represented by a byte or by a number of bytes
    – Conclusion: a string of chars =  a byte stream
    – There is more than one way to interpret a byte stream
    – UTF-8 is just an algorithm used to decode the byte stream that’s being received
    – Hence the need to specify UTF-8
    – It all boils down to the font face you’re using: if it’s Unicode compatible, if it has the chars you’re aiming for etcetera
    – Otherwise, the results can be unpredictable

  • John G. Moore

    I am converting openoffice.org text to html using Firefox. Firefox flags super-scripted characters, left and right apostrophes, and the dash. Can you recommend a converter that substitutes the correct code for these characters? Thank you.
    JGM

    • Anonymous

      find/replace?

      It’s what I’ve always used.

  • http://pulse.yahoo.com/_7DLLHBQF5BSKQHZL7IC723EKZI Cxscv

    welcome to: ======= http://www.goodoye.com/ =======
    The website wholesale for many kinds of fashion shoes, like the nike,jordan,prada, also including the jeans,shirts,bags,hat and the decorations. All the products are free shipping, and the the price is competitive, and also can accept the paypal payment.,after the payment, can ship within short time.
    free shipping
    competitive price
    any size available
    accept the
     http://www.goodoye.com/
    SOCCER JERSEY 16USD jordan shoes $32
    nike $32
    Christan Audigier bikini $23
    Ed Hardy Bikini $23
    Smful short_t-shirt_woman $15
    ed hardy short_tank_woman $16
    Sandal $32
    christian louboutin $80
    Sunglass $15
    COACH_Necklace $27
    handbag $33
    AF tank woman $17
    puma slipper woman $30
    ===== http://www.goodoye.com/ ======
    Read more:  http://www.goodoye.com/http://www.goodoye.com/ line+Dion+magnificent+Vegas+show/4614908/story.html#ixzz1JkOeMJSo