Implementing Multi-Language Support

Jacek Barecki
Tweet

Setting up a multilingual site may be a good way to attract new customers to your business or gain more participants in your project. Translating a simple site with a few static pages probably won’t probably be complicated, but more complex PHP web applications may require a lot of work when launching multiple language support. In this article I’ll present different types of content that need to be taken under consideration when internationalizing a site. Read on to get to know how to handle translating them into different languages.

Multi-language Static Content

First of all, your site probably contains text content hard coded in your project files that needs to be translated into another languages. Text strings in template files or notification messages that are being handled in a PHP script are examples of such content. If you didn’t plan on going international when writing these parts of code, you will have to walk through each line and handle translating the text strings stored there.

But how should the actual translations be done? The common way of handling multiple language content is to use Gettext – a software solution created to handle translations in applications written in different programming languages. It is also available in PHP as a separate extension. Using Gettext allows you to separate the app translations from the source code. In result, the person responsible for preparing the translations doesn’t have to dig into the code and can work independently from a web developer. When the translations are ready, they are being put in a separate file which is being read by the PHP script. The application matches the translations with source text strings stored in the code and finally the end user can see a site being displayed in the right language.
What you need to do as a developer is to transform plain text strings in a PHP application into strings that can be handed by Gettext. You have to wrap the text in the gettext() function which is commonly accessed using the alias _() (the function name is an underscore). By doing this, you specify that a certain text string needs to be translated and can be handled by Gettext.
If you want to get to know the details on how to manage Gettext translations in a PHP application, there is a complete tutorial explaining the subject in details, written by Abdullah Abouzekry: Localizing PHP Applications “The Right Way”. I encourage you to read it to see some practical examples of the logic I’ve described above.
Also remember that if you’re using a framework as a base for your PHP application, it probably has a component responsible for managing the translations. Review your framework’s documentation to check if there are any tools that simplify handling the multiple language support.

Database content

In more complex applications, a large part of the site’s content may be stored in a database. If the site supports only one language, you just need to save one version of a resource which is being fetched when needed. But implementing a multiple language site requires you to change the way of storing the data in a database. Language-specific resources now have to be identified by a language code and have to be fetched in the language version set by a visitor. What it means to you as a developer, is that you will probably have to change your database structure to be able to handle the translations.

To get to know what are the best practices on how to create a database structure for a multiple language site, you may just look at the solutions that are being used by different frameworks. Some frameworks offer behaviors (modules that provide an additional functionality to models) responsible for handling the translations of a specified model. After attaching a translation behavior to a model, the framework will write and read different language versions of data using a database structure that is capable of handling multiple language support.

For example, the CakePHP framework offers the Translate behavior that may be attached to your models. To do so, you need to specify which model fields are language-dependent. Let’s imagine you have the articles table which has two such fields: title and text:

To store the translations, you need to move these language-dependent fields into a separate translations table. Each row in the translations table is identified by the language version, the model and the field name referring to the base table. So the articles table will now look as follows:

The article title and text has been moved to a separate table that holds the translations. As you can see, a single translation of each of these fields is stored in a separate row:

What is most important, the CakePHP framework handles the database reads and writes of the translated data automatically. After attaching the Translate behavior, you can call your model methods just as before and the framework links the main table with the translations table itself. You don’t have to worry about making joins in your select statements or inserting data into multiple tables when saving a new row. Just browse the documentation to see some code examples of handling translations this way.

Other frameworks often offer similar solutions like the one described above. If you’re using the Symfony2 framework with the Doctrine ORM, just check the Doctrine translatable behavior to see an example. If your site doesn’t run on any framework, you can refer to the solutions described above when implementing your own way of handling the translations in a database.

User submitted content

If your site allows the visitors to write comments or reviews connected to a post or a product, you will have to handle the translations of such content as well.

First of all, you need to save the language version of the content entered by the user when he/she submits a form. You will probably just have to assume that it equals to the language version of the site set by the user. You can also employ an external API (e.g. the Google Translate API) to detect the language of a specific text.

Then, you need to translate the given text into all the language versions that are supported by your site. Instead of doing it manually, you can just use an external API that will provide you with machine translations of the content submitted by the visitors of your site. The ProgrammableWeb site lists over 60 translation APIs available over the Web so you will probably easily find a solution that suits your needs. If you’re looking for a specific tutorial on how to implement a translation API in an PHP application, I encourage you to read my articles on the Google Translate API:
Using Google Translate API with PHP which explains the basics on integrating a PHP script with the API,
Auto-translating User Submitted Content Using Google Translate API, which contains a complete code example on how to handle translating user submitted content on a PHP website.

Of course you may choose not to translate the user submitted content and display it in the original language versions. But in my opinion displaying machine translations is better than displaying content in a different language, even if the translations fetched from the APIs aren’t perfect.

Resources

As your site probably consists not only of the text content, you will also have to handle the translations of various resources that are being shared in your webpages. Images, videos, attachments, PDF files – all of these types of content have to be handled as well. To simplify the way of displaying the proper version of a file at a webpage, you can store all the files in a directory structure that reflects the languages available on the site. The English version of a file will be stored under en directory, the French version under fr and so on. Then you can write a simple helper method that will fetch the proper file basing on the language version that is currently being set by a user. If there is a default version of a specific resource, you can extend the method and fetch the default file if the language-specific version hasn’t been found.

Other types of content

Handling the translations of the types of content described above is often not sufficient to end up with a site that supports different language versions completely. Remember that a PHP website often includes other sources of content. JavaScript code that modifies the DOM tree or displays notification messages often contains text strings that have to be translated. Also don’t forget setting the proper language version when using external APIs or widgets that are attached to your webpages (e.g. social plugins). Browsing through your site before implementing multiple language support may provide you with more types of content that have to be taken under consideration when preparing translations.

Summary

As you can see, launching a site that offers multiple language support requires you to handle translating different types of content in different ways. If you plan to translate your site, I encourage you to make a checklist of the types of the translations that need to be done. It may serve as a good starting point for assessing the amount of work that needs to be done and implementing the translations.

If you have any questions or comments regarding the article, feel free to post them below. You can also contact me through Google Plus.

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • Taylor Ren

    Quite a long checklist.

    User content dynamic translation could be arguable.

  • Nilesh Shukla

    Lets talk about “Database Content” for a second. Suppose I am writing an article for 70 countries, if only 5 of them are in different language, this will still create 140 rows (70 for title + 70 for content). Also what if those 5 articles with different languages were created on a separate date? Since the date is stored in core articles table, it is shared among all the articles versions.

  • Roy

    Good ideas except for recommending to translate user-submitted content. As you’ve said machine translation isn’t perfect and sometimes they can alter the meaning of text somebody wrote which can annoy users at best and at worst could lead to making it appear that the user made fallacious or offensive claims when they did not. It is best to leave user-submitted content untranslated.

  • http://sli.su/en/ Богдан Рихаль

    How about SLI? It simple method for creating multilingual site
    http://sli.su/en/
    https://github.com/ganjar/sli