SitePoint Sponsor

User Tag List

Results 1 to 7 of 7
  1. #1
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)

    Why would I ever store É in my database?

    Hello!

    Why would anyone want to store the html entity version of (for example) in their database these days? If I understand things correctly, as long as my data is utf-8 encoded both in my data base and how I retrieve it, and I specify UTF-8 as the charset in my meta tag:

    Code:
    <meta charset="UTF-8" />
    then the will be both correctly saved and then rendered back in the browser. It would seem to me, then, that the html entity will just take up extra space in my database.

    Is there any downside to not encoding the characters?

    Thank you,

    Eric

  2. #2
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Storing entities in database made sense back in the days where the choice of character sets was limited and unicode could not be used. Then when the web site's encoding was, for example, ISO-8852-1 then any characters outside the character set were sent as entities by the browser.

    Currently when you can use unicode I don't see any advantage to storing entities. When you store entities you may create unnecessary problems for yourself, for example one day you may want to use the data for other purpose than sending to a web browser, for example to put it in a plain text email or in a php-generated Word/PDF/spreadsheet document, etc. Then you would need to convert the entities back to their original characters. With a unicode character set you can use the data straight away.

  3. #3
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thank you for putting my mind to ease and for your explanation.

    -Eric

  4. #4
    SitePoint Mentor silver trophybronze trophy
    Mikl's Avatar
    Join Date
    Dec 2011
    Location
    Edinburgh, Scotland
    Posts
    1,565
    Mentioned
    63 Post(s)
    Tagged
    0 Thread(s)
    Keep in mind too that the purpose of an HTML entity - or any HTML, for that matter - is to render your text, that is, to show a visual reprentation of it.

    That's not the concern of a database. The database's role is to store data as efficiently as possible - and to make it easy to update and to retrieve the data. It's perfectly possible to store, say, an accented letter in 8 or 16 bits. Provided you have an agreed coding system, you can easily translate between the stored characters and the HTML entity at the time you want to display it.

    In fact, this is true even without Unicode. It's true that Unicode lets you store a much greater range of characters, but, if that's not a requirement, you can happily use an 8-bit code.

    Mike

  5. #5
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Thank you for the additional input. If at some point I'll need something more exotic such as Chinese, Unicode will still suffice, correct?

  6. #6
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mikl View Post
    It's perfectly possible to store, say, an accented letter in 8 or 16 bits. Provided you have an agreed coding system, you can easily translate between the stored characters and the HTML entity at the time you want to display it.
    While this is true it's important to bear in mind that encoding character into entities (or anything else) will mean losing database support for the character set that is used. This will depend on the database but MySQL has an extensive support for many character sets so if you use the proper character set (or Unicode) then the database can easily do things like sorting alphabetically, changing character case, converting to other character sets, searching in case-insensitive manner or even perform relaxed searches where trying to find letter E will also find accented versions of this letter. Additionally, all the MySQL text string functions will not work properly on entity-encoded strings. You may never need those features but it's good to keep this in mind.

    Quote Originally Posted by kreut View Post
    Thank you for the additional input. If at some point I'll need something more exotic such as Chinese, Unicode will still suffice, correct?
    Yes, I don't think Chinese is very exotic for Unicode

  7. #7
    SitePoint Evangelist
    Join Date
    Mar 2011
    Location
    Bellingham, WA
    Posts
    450
    Mentioned
    1 Post(s)
    Tagged
    0 Thread(s)
    Heh heh.

    Enjoy both of your weekends!


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •