Here is more clear:
It's MySQL trying to be confusing.
The difference lies in comparisions. If you did
CODE: SELECT ALL
SELECT username FROM users WHERE username < 'Cheese';
It might return different results based on the encoding. For example, utf8_unicode_ci changes the german "double-s" character (ß) into "ss" for comparison. But the bin variation uses the strings "as they are" with their binary value.
So a better example would be these two queries:
CODE: SELECT ALL
SELECT username FROM users WHERE username < 'ß';
SELECT username FROM users WHERE username < 'ss';
On the utf8_unicode_ci, they should give the same result. They should not give the same result in utf8_bin -- assuming that your colletion of usernames is large enough.
I think this does not means that utf8_unicode_ci is less accurate (well it is but it finds all matches that have to) when comparing. Then I read this:
The utf8_bin collation compares strings based purely on their Unicode codepoint values. If all of the codepoints have the same values, then the strings are equal. However, this falls apart when you have strings with different composition for combining marks (composed vs. decomposed) or characters that are canonically equivalent but don't have the same codepoint value. In some cases, using utf8_bin will result in strings not matching when you expect them to. Theoretically, utf8_bin is the fastest because no Unicode normalisation is applied to the strings, but it may not be what you want.
If I undestand it right utf8_unicode_ci will find even more matches than utf8_bin, with utf8_bin two usernames that look the same, will be different for the bin variation so they can coexist.
I'm gonna do some tests with utf8_unicode_ci, to see if it works correctly, and how slow it is.