SitePoint Sponsor

User Tag List

Results 1 to 16 of 16
  1. #1
    SitePoint Wizard donboe's Avatar
    Join Date
    Jun 2010
    Location
    Netherlands
    Posts
    1,963
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Right structure for multiple language DB

    I made a start setting up a DB structure for multiple language websites, but I have the feeling that I miss something and I can't put my finger on what it is. This are the tables I have so far:
    Code MySQL:
    CREATE TABLE IF NOT EXISTS `languages` (
      `language_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
      `language` VARCHAR(12) DEFAULT NULL,
      `country_flag` CHAR(8) DEFAULT NULL,
      `isActive` TINYINT(1) NOT NULL DEFAULT '0',
      PRIMARY KEY  (`language_id`),
      UNIQUE KEY `language` (`language`)
    ) ENGINE=INNODB  DEFAULT CHARSET=utf8;
     
    CREATE TABLE IF NOT EXISTS `pages` (
      `page_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
      `page_address` VARCHAR(128) DEFAULT NULL,
      PRIMARY KEY  (`page_id`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
     
    CREATE TABLE IF NOT EXISTS `content` (
      `content_id` TINYINT(2) NOT NULL AUTO_INCREMENT,
      `page_id` TINYINT(2) NOT NULL DEFAULT '0',
      `title` VARCHAR(40) DEFAULT NULL,
      `heading` VARCHAR(255) DEFAULT NULL,
      `content` TEXT,
      PRIMARY KEY  (`content_id`),
      KEY `page_id` (`page_id`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ;
    Those tables are self explaining I guess. table languages holds the different languages with the requested number of languages set to active. table pages are the actual pages in the website: about_us.html etc. table content holds the actual content for the different pages. And at this point I have a block! I think I need at least 1 if not more relational tables. Or should I add another field language_id to the content table? Can anyone tell me If I miss something here or that I'm on the right track
    “Good artists copy, great artists steal” (Pablo Picasso - 1934)

  2. #2
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is the language relative to the page or the content?

    I would suggest that it relates to the content and so language should be a 1:1 relationship between each content record in the db. that means a 'title' and 'content' written in French, will show in French whereas a 'title' and 'content' written in Greek, will show in Greek. 1:1

    in my CMS, I have these table structures

    Code:
    languages:
    | language_abbr | language_full | flag |
    | ENG   (pk)    | English       | eng.gif |
    | GRE   (pk)    | Greek         |  gre.gif |  
    file_sequencing
    | file_id | file_name_home_language |  sequence_number |
    
    *sequence_number determines the order they display in the nav menu
    ** home language is the home language of the business/user not the web viewer
    
    file_data
    | id | file_id | language_abbr | title | heading | content |

    I hope that gives some assistance

  3. #3
    SitePoint Wizard
    Join Date
    Dec 2003
    Location
    USA
    Posts
    2,582
    Mentioned
    29 Post(s)
    Tagged
    0 Thread(s)
    You may want to consider something more standard like PO language files. I personally don't know much about them, though it's on my to do list.

    If you use something more standard, you can find more support (whereas if you make something proprietary, it's on you).

  4. #4
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,048
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    content_translations
    - content_id ( fk content(content_id) )
    - language_id ( fk languages(language_id) )
    - title
    - heading
    - content
    - pk(content_id,language_id)

    You may also want to add language_id to the content table to determine the contents origin language.
    The only code I hate more than my own is everyone else's.

  5. #5
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    bazz, good explanation

    by the way, it's one-to-one only as seen from the content side -- from the language side, it's many-to-one, as there can (hopefully) be more than one content for each language

    i would use the international 2-char language code for the language PK, but the 3-char one is okay too, i guess

    but please, not a auto_increment id, okay oddz?
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  6. #6
    SitePoint Wizard donboe's Avatar
    Join Date
    Jun 2010
    Location
    Netherlands
    Posts
    1,963
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Good explanation indeed IBazz. This helps me a lot indeed. One question though! Is the a particular reason why you use the language_abbr as PK instead of a number?

    @Rudy: can you explain what you mean with this:
    i would use the international 2-char language code for the language PK, but the 3-char one is okay too, i guess
    The auto_increment was indeed the thing that made me confused I guess
    “Good artists copy, great artists steal” (Pablo Picasso - 1934)

  7. #7
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by donboe View Post
    @Rudy: can you explain what you mean with this:
    this --> ISO 3166-1 alpha-2

    the reason you would use it instead of a number is because it's so much better than a number!!!!!
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  8. #8
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Is the a particular reason why you use the language_abbr as PK instead of a number?
    in simple terms it's because its unique to a language. (OK, so there is a variety of English such as US English and International english but, fundamentally, its all English). and if there is a natural PK in your table content, there is little benefit in using a surrogate key.

    However, on a technical point, I am not sure if (in other circumstances), a huge PK value would be as efficient in terms of bytes, as an auto_increment number. Personally, I value readability quite highly when moving from table to table as part of the development process.

    bazz

  9. #9
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    ISO 3166 are country codes, not language codes....
    ISO 639 are language codes

    BUT

    Having built multiple multi-lingual databases, I would suggest you use BOTH... and establish a LOCALE table instead of a LANGUAGE table.

    Locale is the combination of language and country, e.g. en_US, or en_GB. It gives you more ability to localize your application not only on language, but currency, number formatting, metric-vs-kings measurements, etc.

    I typically structure my ML DB as such:

    Code:
    -- Fill this with all ISO-639 languages
    CREATE TABLE language (
        id CHAR(2) NOT NULL PRIMARY KEY, 
        name VARCHAR(255) NOT NULL
    ) TYPE=InnoDB;
    
    -- Fill this with all ISO-3166 Countries
    CREATE TABLE country (
        id CHAR(2) NOT NULL PRIMARY KEY,
        name VARCHAR(255) NOT NULL
    ) TYPE=InnoDB;
    
    CREATE TABLE locale (
        language_id CHAR(2) NOT NULL REFERENCES language(id),
        country_id CHAR(2) NULL REFERENCES country(id),
        name VARCHAR(255),
        active TINYINT NULL
        PRIMARY KEY (language_id, country_id)
    ) TYPE=InnoDB;
    You can either prefill your entire locale table and use an active/inactive flag (as shown) or only fill it with locales you support.

    Country should be nullable so you can create a default locale with no country (e.g. "en" or "es"). Typically you should have your application use a particular country's locale act as the default. This can be achieved by adding a "default" flag to your country table and making "country_id" not null... then any non-country-specific locale or unsupported country's locale would fall back to the default country's locale.

  10. #10
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Extending my above example and applying to your DB:

    Languages: http://www.loc.gov/standards/iso639-2/php/code_list.php
    Countries: http://en.wikipedia.org/wiki/ISO_3166-1_alpha-2
    Locales: http://www.roseindia.net/tutorials/I...les-list.shtml

    Code:
    -- Fill this with all ISO-639-1 languages
    CREATE TABLE `languages` (
        `language_id` CHAR(2) NOT NULL PRIMARY KEY, 
        `name` VARCHAR(255) NOT NULL
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    -- Fill this with all ISO-3166 Countries
    CREATE TABLE `countries` (
        `country_id` CHAR(2) NOT NULL PRIMARY KEY,
        `name` VARCHAR(255) NOT NULL,
        `default` TINYINT NOT NULL DEFAULT 0
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    -- Fill this table with supported locales
    CREATE TABLE `locales` (
        `locale_id` VARCHAR(5) NOT NULL PRIMARY KEY, -- This would be like 'en_US', etc.
        `language_id` CHAR(2) NOT NULL REFERENCES `languages`(`language_id`),
        `country_id` CHAR(2) NULL REFERENCES `countries`(`country_id`),
        `name` VARCHAR(255),
        `active` TINYINT NOT NULL DEFAULT 1
        PRIMARY KEY (`language_id`, `country_id`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
    
    CREATE TABLE IF NOT EXISTS `pages` (
      `page_id` INT UNSIGNED NOT NULL AUTO_INCREMENT,
      `page_address` VARCHAR(128) DEFAULT NULL,
      PRIMARY KEY  (`page_id`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8;
     
    CREATE TABLE IF NOT EXISTS `page_contents` (
      `page_id` INT UNSIGNED NOT NULL REFERENCES pages(page_id),
      `locale_id` VARCHAR UNSIGNED NOT NULL REFERENCES locales(locale_id),
      `title` VARCHAR(40) DEFAULT NULL,
      `heading` VARCHAR(255) DEFAULT NULL,
      `content` TEXT,
      PRIMARY KEY  (`page_id`, `locale_id`)
    ) ENGINE=INNODB DEFAULT CHARSET=utf8 ;

  11. #11
    SitePoint Wizard bronze trophy
    Join Date
    Jul 2006
    Location
    Augusta, Georgia, United States
    Posts
    4,048
    Mentioned
    16 Post(s)
    Tagged
    3 Thread(s)
    Quote Originally Posted by r937
    bazz, good explanation

    by the way, it's one-to-one only as seen from the content side -- from the language side, it's many-to-one, as there can (hopefully) be more than one content for each language

    i would use the international 2-char language code for the language PK, but the 3-char one is okay too, i guess

    but please, not a auto_increment id, okay oddz?
    Yeah, i would agree with that. On the application side of things two or three alphabetic characters isn't going to be an issue passing via the URL if its even needed.

    transio, that new design doesn't support translations. How would you determine the available translations for an article that originated in English?
    The only code I hate more than my own is everyone else's.

  12. #12
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    oddz,

    That design most certainly DOES support translations. Is the originating language important?

    the page_contents table contains the contents of all pages in multiple languages. The page_id and locale_id combination define which page and what language the contents are in.

    Cheers.

  13. #13
    SitePoint Guru
    Join Date
    Sep 2008
    Posts
    977
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Well, I can't profess on the issue but, I wonder if locales is a bit 'PC gone too far'. Surely its about communication? so if the different dialects don't cause a breakage in communication then the does the adage 'if it ain't broke, don't fix it', apply?

    Maybe it's useful for multi-national entities but the cost of translation is quite significant for smaller businesses. I digress ops:

  14. #14
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Locale implies more than just language, IBazz... For example, en_US, en_GB, en_CA, en_ZA, en_AU, en_NZ - all imply "English", and can fall back on the default "English" dialect you choose for your site... but which ones use mm/dd/yyyy vs. dd/mm/yyyy? Which use meters and kilos vs. feet and pounds? These are important "locale" settings that have nothing to do with language. If your mission is to localize your website, why restrict your DB to language when it's not much more work to account for full locale, and allow future localization of other non-linguistic information?

    Don't think that's important? If you print your date for an upcoming event as 04/05/2011 - half your audience may believe it's April 5th, the other half May 4th... think that's not an important miscommunication? Think again.

  15. #15
    SQL Consultant gold trophysilver trophybronze trophy
    r937's Avatar
    Join Date
    Jul 2002
    Location
    Toronto, Canada
    Posts
    39,016
    Mentioned
    53 Post(s)
    Tagged
    2 Thread(s)
    Quote Originally Posted by transio View Post
    If you print your date for an upcoming event as 04/05/2011 ...
    patient: doctor, it hurts when i do this

    doctor: well, don't do that then

    no confusion anywhere if you print 2011-04-05

    but your point about locales and other attributes besides language is very well made
    r937.com | rudy.ca | Buy my SitePoint book: Simply SQL
    "giving out my real stuffs"

  16. #16
    Non-Member
    Join Date
    Apr 2004
    Location
    Miami, FL, USA
    Posts
    449
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by r937 View Post
    no confusion anywhere if you print 2011-04-05
    Heh... "equally confusing EVERYONE" doesn't equate to "no confusion anywhere". I've used dates in YYYY-MM-DD format in websites before... the specific case I'm thinking of was a form with a jQuery datepicker that filled the field with a Y-m-d date. I never saw so much negative user feedback in my life!!! People called us at least every other day confused about the dates!!! Don't assume people are smart... that's a sure way to develop poor usability apps.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •