SitePoint Sponsor

User Tag List

Results 1 to 8 of 8

Thread: case sensitive unicode table?

  1. #1
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    case sensitive unicode table?

    phpMyAdmin - 2.8.2.4, and MySQL - 5.0.51a

    in phpmyadmin when setting up a new table, there's a number of character encodings in the colation menu including utf8_unicode_ci. there's no case sensitive version to choose. how can i get a unicode case sensitive encoding for a table?

    thanks.

  2. #2
    reads the ********* Crier silver trophybronze trophy SitePoint Award Recipient longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    unicode doesn't have case sensitivity. you'll have to choose a non-unicode utf8 collation if you want case sensitivity.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  3. #3
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    right, i see, thanks.

  4. #4
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    a lot of the encodings have just ci and bin options. does bin amount to similar/same as case sensitive by any chance? there's a unicode bin. i know bin is short for binary, so i guess it uses the data in a raw kind of way, which would give case sensitivity right?

  5. #5
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    yup, i just tested that and utf8_bin does give case sensitivity. any reason i shouldn't use that? thanks.

  6. #6
    reads the ********* Crier silver trophybronze trophy SitePoint Award Recipient longneck's Avatar
    Join Date
    Feb 2004
    Location
    Tampa, FL (US)
    Posts
    9,854
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    unicode is designed to support all characters from all languages. different languages might use the same character, and those languages might have different rules for case sensitivity. therefore, there is no (and can be no) such thing as utf8_unicode_cs.
    Check out our new Industry News forum!
    Keep up-to-date with the latest SP news in the Community Crier

    I edit the SitePoint Podcast

  7. #7
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    i'd have thought it'd be harder/more complicated to have a case insensitive encoding than sensitive. sensitive is just literal/actual. no conversion. as is. i don't know, i'm sure you're right but it makes no sense to me. utf8_bin seems to be giving me what i want though. a unique column with aaa and AAA in for example is fine in that encoding. thanks

  8. #8
    SitePoint Wizard
    Join Date
    Apr 2002
    Posts
    2,252
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    sorry, i've got to bring this up again because it's niggling me. i'd like to scrub my above "i don't know, i'm sure you're right but...", i'm not at all sure. in fact i reckon you're wrong!

    surely "different languages might use the same character, and those languages might have different rules for case sensitivity" should be:

    different languages might use the same character, and those languages might have different rules for case insensitivity ?

    or am i getting mixed up on the meaning of sensitive and insensitive? (no, i didn't think so but just wanted to be absolutely 100% sure: a utf8_unicode_ci, case insensitive, encoded table with a primary key column, put 'abc' in, then attempted 'ABC' and got an error because of dupicate key so it's using english language rules to know that abc, case insensitvely is the same as ABC)

    the rules you're talking about are like ones which map a to A for english for example. you only need that for case insensitive encoding. i'm looking for case sensitive where no such mappings/rules are required. all the unicode encoding are ci, case insensitive, (apart from the bin one) where it maps between upper and lowercase to make the same values out of abc and ABC which all must use the rules. as i say the bin version does give case sensitive encoding in unicode, presumably because it's assessing it as raw data, not using any langauge rules.

    surely the rules for different languages are irrelivent for case sensitive encodings? what language rules do you need to work out that abc is the same as abc but is not the same as anything else? no language rules are needed for case sensitive encoding. right?

Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •