SitePoint Sponsor

User Tag List

Results 1 to 2 of 2
  1. #1
    Programming Team silver trophybronze trophy
    Mittineague's Avatar
    Join Date
    Jul 2005
    Location
    West Springfield, Massachusetts
    Posts
    17,191
    Mentioned
    191 Post(s)
    Tagged
    2 Thread(s)

    changing charset and collation in a populated MySQL database

    I've never had to do it so I don't know. But this thread characters appearing in database values made me wonder, is there a "safe" or "minimal work involved" way to change the character encoding and collation from the MySQL defaults to UTF-8 without making a total mess of things?

  2. #2
    SitePoint Guru bronze trophy
    Join Date
    Dec 2003
    Location
    Poland
    Posts
    930
    Mentioned
    7 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Mittineague View Post
    I've never had to do it so I don't know. But this thread characters appearing in database values made me wonder, is there a "safe" or "minimal work involved" way to change the character encoding and collation from the MySQL defaults to UTF-8 without making a total mess of things?
    I think you should specify what "MySQL defaults" you are referring to because my MySQL installation has utf8 as default - but I suppose you mean latin1 .

    The simplest answer is - just change the collation of your text columns to utf8 using ALTER TABLE and all data will be converted without any loss of information (because utf8 can represent all latin characters). The only nuisance can be natural primary key columns that you are using as foreign keys - if you have cases like that then you'll have to disable foreign key checks before changing their collation because the collation of columns joined with a FK must be identical and normally MySQL will not let you change one first and then the other.

    That's all there is to it as far as converting the data is concerned.

    Another thing is the character set used by your application. It is advisable that your application explicitly sets the character set of database connection instead of relying on server defaults, which is done using SET NAMES, $mysqli->set_charset(), and the like. This is the character set in which your application will receive and send data regardless of the character set of the database. So these are two separate things and actually you can change the character set of your database tables and your application will be unaffected by that change as long as both character sets can represent the characters that you are using.

    If I were in your shoes I would split the task into 2 steps:

    1. Adjust your application to use utf8 and to send/receive data in utf8 - do things like SET NAMES('utf8'), or set_charset('utf8') on your db driver, etc. And all other things like sending web pages in utf8, processing data in utf8, etc. The db can remain in latin1 while your application will work in utf8 - all will be well as long as you stay within the range of latin1 characters.

    2. Change all text columns to a utf8 collation. Once you do this you will be able to use the full set of unicode characters.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •