SitePoint Sponsor

User Tag List

Results 1 to 12 of 12
  1. #1
    SitePoint Enthusiast amit1101's Avatar
    Join Date
    May 2003
    Location
    London
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Multilingual Site with CMS?!

    Hi All,

    Need some advice, I've been given the task to create a multilingual site with a content management system, the following languages need to be supported:

    English, Spanish, Standard Arabic, Romanian, Ecuadorian (possibly same as spanish, yet to be confirmed), German, French and Italian.

    I was thinking of having one language (English) saved into a database table and then using a tranlation service to change the language on the fly. But having read through the forums this doesn't sound like a good option, due to languages being so different and the literal translation is not always correct.

    Another option would be to have a seperate table for each language and allow for each section to be altered independantly but this causes redundancy of data.

    How would I begin creating such a site?

    The CMS is going to be fairly straight forward, limited functionality, uploading of an image or 2 and possibly making text bold, underlining, or links. The CMS can be in english but should allow the user to write in all of the above different languages?

    Is it the case that each language then becomes its own mini site?

    How would I allow a user to type Arabic into the CMS? This language is written/read right to left!

    Would MYSQL save all these different kinds of languages without any problems?

    How much time should I allow for this to be created, say my level of PHP is average?

    I'd be extremely grateful for some advice as to were to begin and whether im on the right track. This is the first multilingual website I have ever had to make.

    Thanks

    A

  2. #2
    SitePoint Member
    Join Date
    Nov 2006
    Posts
    2
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)

    Try Joomla

    Lately I work with Joomla . The are already quite a bit of translations for it. There are even a module available to tell you what content is not translated yet and when it is not translated it display an article in the main language

  3. #3
    SitePoint Enthusiast amit1101's Avatar
    Join Date
    May 2003
    Location
    London
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by lumard
    Lately I work with Joomla . The are already quite a bit of translations for it. There are even a module available to tell you what content is not translated yet and when it is not translated it display an article in the main language
    Thanks for the reply, I have had a look at joomla, and I will try and use some of this as a starting point, but it still does not answer the questions ive posted above...im looking for a bit more strategy/direction...rather than an out of the box solution.

  4. #4
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Internationalizing a site with more-or-less static content is (so far as I18N can be) trivial - you just have to translate every string that's displayed and store each language in its own seperate file or database table. When your content is dynamic (as all CMSs naturally are), it becomes obscenely difficult.

    Basically each entry into your CMS (let's just call them articles) would either need to be translated into each language, or else you'd essentially have to fork your CMS into effectively seperate sites divided by language. The latter is the most common approach since it becomes a huge burden to translate every single article into each language, and that's on top of making sure that every change to the site itself is reflected in each language file/table.

    I'm heading in that direction myself, so I'll share with you the (still under development) strategy I've adopted (note that I differentiate between text of the site itself, such as headers and footers and navigation, and the text of user-submitted articles):

    The pages will, instead of outputting hard-coded strings, echo the values of variables. These will be pulled from a database which will have a table to each language, each table of course holding every string, identified by some mnemonic identifier (e.g. the copyright statement in the footer might be identified as "copyright"). This makes the code somewhat easier to maintain versus using identifiers like 176. The language of the identifiers is irrelevent (I'd recommend using whatever language you code in) as they'll never be seen by anyone who isn't looking at the source code.

    The tricky part is user-submitted content. My plan is to allow users to translate their own content into whatever language(s) they can/they want to. Then users browsing the site will see an article in their preferred language, a default language, or the article's original language (chosen in that order). I'm toying with the idea of allowing users to create an ordered list of preferred languages (e.g. mine might look something like "English, French, German, Spanish"), defaulting to the article's original language if it doesn't exist in the list of preferred languages (e.g. I'd see a Russian article in Russian if it were translated into only Japanese and Klingon, none of which are in my preferred list).

    I haven't decided yet if I'm going to distinguish between e.g. American English and British English, as speakers of either can generally read articles written in the other; I'm in fact leaning towards not making that differentiation, although that might bite me in the butt later.

    I don't want to limit my users to only languages that I've translated the site into (i.e. I'll let users create Russian translations of the articles even if the site itself only has English and French versions), and neither do I want to force them to translate their articles.

    Hope this helps you at least somewhat. Be aware that you're tackling a huge project here, and you're in for more work than you likely realize.

  5. #5
    SitePoint Enthusiast amit1101's Avatar
    Join Date
    May 2003
    Location
    London
    Posts
    31
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Hi kromey,

    Thanks for the comments, it was more what i was looking for! With regards to the translations, will you merely be converting one format into another on the fly, what I'm trying to get at is that if you were to translate a bit of text from english to any other language, what method are you going to adopt to ensure that the translation is correct in context of the language? i.e. "what is your name" in Spanish could be translated to "Cómo se llama usted" in the formal setting or "Cómo te llamas" in an informal setting, using babel fish or similar the translation I get is "cuál es su nombre"

    So how are you going to know which one to use? Hence the idea of having the administrator enter each language independantly, but I totally understand the effort this will take to maintain.

    My biggest concern is the Arabic, I dont know much about the language and I forsee testing that would be absolute nightmare! Would MYSQL be able to hold arabic characters?

    How long do you think a project like this would take to implement? I will definately be using PHP and MYSQL to code this.

    You are absolutely correct with regards to the scale of this project and it may the case that I am biting off more than I can chew, but hopefully i'll learn a lot from taking it on!

  6. #6
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    I shall answer your questions completely out of order, just to make things more interesting.

    Yes, MySQL can hold Arabic characters, provided that you set the character encoding correctly.

    The translations will all be done by hand by multiple translators. I know English and French, so I'll be doing both of those; I've already lined up translators for Spanish, Russian, and Japanese as well. Basically I'll be handing them a list of English strings and telling them to give me a translated list back. As far as formal vs informal, I figure I'll use formal language for most if not all strings (I can see a few instances where a more familial tone could be used, but the bulk at least should be formal). This might become troublesome with e.g. Japanese, which has several different levels of formality; I'll have to discuss that with the translator. Using on-the-fly translators are just not an option: in addition to being generally slow, there isn't a single one that can produce truly accurate translations, nor even reliably grammatical translations.

    I have no idea how long such a thing would take to implement, as I've never done it before and have yet to write even a single line of code for my own project. The actual code would not take too much more time than it would for a non-I18N project - the only real difference would be referencing values from a database as opposed to hard-coding every string. The real time-consuming part comes from the actual translations, and will vary wildly depending on number of strings to translate and skill and number of translators and number of languages.

  7. #7
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    There have been any number of discussions on this in the Php Application Design forum.

    http://www.sitepoint.com/forums/search.php?

    Use the search above, constrain it to the PAD forum and search for i18n, I turned up a few.

    Hope this helps. For the GUI/errors etc I too went for strings in a database, but in hindsight giving each translator their own lang.ini file could have been easier....

    Then use parse_ini_file() depending on the language.

    This doesn't help your article translation idea much, I know...

  8. #8
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    My plan is to in fact give each translator their own .ini file which they will be able to continually access even after their language has gone live. However, the application itself will not use these .ini files, but will instead pull from the database; the .ini files will be periodically (once a week?) loaded into the database via a cron job. That way I get the convenience of giving a single file to the translators but the speed of a database. The downside to this approach, of course, is the delay between updates being made to the language file and the site reflecting those changes.

  9. #9
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Quote Originally Posted by kromey
    My plan is to in fact give each translator their own .ini file which they will be able to continually access even after their language has gone live. However, the application itself will not use these .ini files, but will instead pull from the database; the .ini files will be periodically (once a week?) loaded into the database via a cron job. That way I get the convenience of giving a single file to the translators but the speed of a database. The downside to this approach, of course, is the delay between updates being made to the language file and the site reflecting those changes.
    Nice idea.

    Presumably you're doing that partly for speed gains?

    If that is the reason have you looked at the apc memcache yet?

    www.php.net/apc

  10. #10
    SitePoint Wizard silver trophy kyberfabrikken's Avatar
    Join Date
    Jun 2004
    Location
    Copenhagen, Denmark
    Posts
    6,157
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by amit1101
    Would MYSQL be able to hold arabic characters?
    Use UTF-8 for everything. Database charset, database collation, database connection (this is normally abstracted away in a database-connection layer such as PDO) and html-output (This is important). UTF-8 covers all existing charsets and is supported by virtually all platforms.

    Quote Originally Posted by kromey
    Basically each entry into your CMS (let's just call them articles) would either need to be translated into each language, or else you'd essentially have to fork your CMS into effectively seperate sites divided by language. The latter is the most common approach since it becomes a huge burden to translate every single article into each language, and that's on top of making sure that every change to the site itself is reflected in each language file/table.
    For some sites, I have been using a variation of the first case for translating dynamic content. Basically, each article will always exist in the default language, but may exist in additional translations. If a translation exists for the users preferred language, that will be used, else the default language will. This makes it possible to have a single site structure, but multiple language versions.
    This strategy may work better with some applications than others. I have successfully used it for an e-commerce site, where there was a single catalog of commodities, but visitors from several countries. I haven't tried translating anything with user-submitted content, and I reckon this strategy wouldn't work well for that.

  11. #11
    Worship the Krome kromey's Avatar
    Join Date
    Sep 2006
    Location
    Fairbanks, AK
    Posts
    1,621
    Mentioned
    0 Post(s)
    Tagged
    0 Thread(s)
    Quote Originally Posted by Cups
    Nice idea.

    Presumably you're doing that partly for speed gains?

    If that is the reason have you looked at the apc memcache yet?

    www.php.net/apc
    Aye, I'm doing that almost entirely for speed gains (and prettier code, because SQL queries generally look cleaner than parsing through a big ol' flatfile).

    I've looked at the apc memcache (which is to say, I've read through the web site), but until I have a good development server set up playing with it is just not a real option. (My dev server right now is my WinXP laptop running Apache 2/PHP 5.2; my production server is RHEL running Apache 2/PHP 4.2. Yes, I agree, this needs to change, and by "this" I mean everything except the Apache 2 and Linux parts!)

  12. #12
    SitePoint Wizard silver trophybronze trophy Cups's Avatar
    Join Date
    Oct 2006
    Location
    France, deep rural.
    Posts
    6,869
    Mentioned
    17 Post(s)
    Tagged
    1 Thread(s)
    Off Topic:

    It is pretty mind boggling, I have to admit. After a few minutes of turning it on you see which include files that are being cached, and after a few hours you can see how many times less your hdd is being clobbered.


Bookmarks

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •