[SIZE=“2”]I was doing some research on what the best way is to store UTF-8 text in MySQL database.
Some say that the best way to store UTF-8 text is in a binary field such as
BLOB instead of TEXT with collation set to UTF-8.
So is binary the perfect way to go or absolutely not?[/SIZE]
This might have been reasonable for very old mysql versions (4.0 and earlier) which had very limited character set support. When your host decided that the character set is latin1 then you had to live with it and using BLOB was a way to ensure UTF-8 strings didn’t get corrupted. Now those days are gone, set the proper charset on your column and use TEXT.
MySQL these days supports setting character encoding type on a per column, per table, per database, and even a default system wide. Most specific wins of course. If you are using a version of MySQL too old to allow this I’d strongly recommend upgrading.
BLOB is for binary data mostly. I can see using it to handle serialized data, but that shouldn’t come up often. I don’t even know if MySQL can attempt to do a text search on a blob field, but even if it can I imagine its far less efficient and far more error prone.
Don’t forget CHAR and VARCHAR. They have their place. And I believe there’s LONGTEXT in case you get the urge to store War & Peace on a single row
Sure it can, why not? A blob field is like a text field with no collation so it’s even simpler. This will work:
SELECT * FROM table WHERE blob_col LIKE '%beauty%'
The search will be binary of course - no collation rules, no case insensitivity. You can even do this:
SELECT * FROM table WHERE blob_col LIKE CONCAT('%', 0x0D4CD1092D, '%')
This will work, however you have to first properly escape wildcard characters (% and _) in the binary data before injecting into LIKE.
But this will work out of the box:
SELECT * FROM table WHERE blob_col = 0x0D4CD1092D
Of course, I’m not saying blob should be used for text. But if someone really needs it, mysql will search blobs without any problem if you handle it well.
If you are using a version of MySQL too old to allow this I’d strongly recommend upgrading.
Haha
I was always working with collation set instead of BLOB for TEXT.
But while working on a full UTF-8 project I started to doubt because even with good old books and searching the net, the information seem cluttered about the subject.
Thanks to you all it is pretty clear that binary is NOT the way to go for UTF-8 TEXT!