Localizing PHP Applications “The Right Way”, Part 2

Share this article

Welcome back to this series of articles which teach you how to localize your PHP applications using gettext and its PHP extension. In Part 1 you took your first steps towards towards this by installing gettext and Poedit, creating a translation file, and writing a Hello World script. In this part you’ll lean about each of the function used in the script, and dive more into the gettext library and its usage.

The “Hello World” Script

To review, Part 1 showed you the following script as TestI18N/test-locale.php:

<?php
// I18N support information here
$language = "en_US";
putenv("LANG=" . $language); 
setlocale(LC_ALL, $language);

// Set the text domain as "messages" to 
// use Locale/en_US/LC_MESSAGES/messages.mo
$domain = "messages";
bindtextdomain($domain, "Locale"); 
bind_textdomain_codeset($domain, "UTF-8");

// Use the messages domain
textdomain($domain);

echo _("HELLO_WORLD");

Calling putenv() and setting the LANG environment variable instructs gettext which locale it will be using for this session. en_US is the identifier for English as used in the United States. The first part of the locale is a two-letter lowercase abbreviation for the language according to the ISO 639-1 specification, and the second part is a two-letter uppercase country code according to the ISO 3166-1 alpha-2 specification. setlocale() specifies the locale used in the application and affects how PHP sorts strings, understands date and time formatting, and formats numeric values.

gettext calls the catalog file used to store the translation messages (the MO file) a domain. The bindtextdomain() function tells gettext where to find the domain to use; the first parameter is the catalog name without the .mo extension, and the second parameter is the path to the parent directory in which the en_US/LC_MESSAGES subpath resides (which in turn is where the translation file resides). If you’re wondering where the subpath en_US/LC_MESSAGES comes from, it is constructed by gettext using the values of the LANG variable you specified using putenv() and the locale category LC_MESSAGES. You can call bindtextdomain() several times to bind as many domains as you want, in the event you’ve split your translations up throughout multiple files.

Calling bind_textdomain_codeset() is very important because not doing so can lead to unexpected characters in your output when using non-ASCII letters. Since the catalog messages are encoded in UTF-8, that is what the example code sets as the codeset. I always recommend using UTF-8 as is the most widely supported Unicode encoding. Don’t use other less-known encodings unless you know exactly what you are doing; you will encounter serious problems, especially on the web.

The call textdomain() tells gettext which domain to use for any subsequent calls to gettext(), or its shorthand alias _(), or its plural form lookup method ngettext(). I’ll talk about dealing with plural forms in the next installment, but for now you should know that all three of these methods lookup messages in the current domain specified with textdomain().

Lastly, the script calls _(), which looks up the msgid HELLO_WORLD in the messages.mo file and returns the msgstr associated with it, the text Hello World!

Missing Translation Strings

Now that you have a basic understanding of how this simple script looks up replacements for translations, try changing the domain.

<?php
$language = "en_US";
putenv("LANG=" . $language); 
setlocale(LC_ALL, $language);

$domain = "foo";
bindtextdomain($domain, "Locale"); 
bind_textdomain_codeset($domain, "UTF-8");
// ...

gettext will try to look up the catalog Locale/en_US/LC_MESSAGES/foo.mo, which shouldn’t exist.

When you view the script’s output you’ll see HELLO_WORLD instead of the Hello World! gettext can’t perform a translation because there isn’t a valid catalog, though another scenario might be the given msgid might not exist in any catalogs registered with gettext, and it is smart enough to use the original string you supplied.

Targeting Multiple Locales

In a real-world application, you will typically use your target language’s strings as the IDs throughout your code. This makes the code a bit clearer and the fallback of a translation failure more user friendly. For example, if your application uses English and French as the target languages, you can use English as the ID strings and then create French catalogs to replace the English.

In the same TestI18N/Locale directory, create a new directory named fr_FR containing another LC_MESSAGES directory, and use the procedures outlined in Part 1 to create a new catalog for French. When you’re finished, you should have the following hierarchy:

en_US and fr_FR directories

When you specify the catalog settings in Poedit, remember to set French as the language and France as the country.

Poedit settings window for French

My French messages.po will look like this when opened in a text editor:

msgid ""
msgstr ""
"Project-Id-Version: TestProjectn"
"POT-Creation-Date: n"
"PO-Revision-Date: n"
"Last-Translator: FIRSTNAME LASTNAME <email@example.com>n"
"Language-Team: MyTeam <team@example.com>n"
"MIME-Version: 1.0n"
"Content-Type: text/plain; charset=utf-8n"
"Content-Transfer-Encoding: 8bitn"
"X-Poedit-Language: Frenchn"
"X-Poedit-Country: FRANCEn"
"X-Poedit-SourceCharset: utf-8n"

#Test token 1
msgid "HELLO_WORLD"
msgstr "Bonjour tout le monde!"

#Test token 2
msgid "TEST_TRANSLATION"
msgstr "Test de traduction..."

Most of the header lines of the file are self explanatory, so I’ll skip right to the actual translation lines which start with the first msgid after the headers. Notice that there are two strings for each phrase to be translate, the msgid which is the ID string in your code gettext will look up, and the msgstr which is the translated message which gettext will substitute for the ID. The first definition instructs gettext to use Bonjour tout le monde! whenever it sees HELLO_WORLD. The second instructs gettext to use Test de traduction… for TEST_TRANSLATION.

Open the catalog file again in Poedit and click the Save Catalog entry in the icon bar to save and compile it. Then modify the PHP script to use fr_FR instead of en_US. When you run it, you’ll see the output in your browser is now French!

Summary

In this part you learned what each function call does in the Hello World script introduced in Part 1. In terms of its API, gettext isn’t really a large library. There are only a handful of functions, most of which you will only use once in your entire application. The most frequently used will be gettext(), or it’s shorthand alias _(), and its plural form equivalent ngettext(). You also learned how to target multiple Locales (en_US and fr_FR in our example), and how gettext falls back to the msgid when its missing a translation.

In the next part you’ll see how to start doing real world localization by organizing the directories, switching between languages, choosing a fallback language, and overriding the current selected messages domain.

Image via sgame / Shutterstock

Abdullah AbouzekryAbdullah Abouzekry
View Author

Abdullah Abouzekry is an experienced web-developer with over 7 years developing PHP/MySQL applications ranging from simple web sites to extensive web-based business applications. Although his main experience is with PHP/MySQL and related web technologies, he has developed and localized many desktop applications in C#, Python/Qt, Java, and C++. When not writing code, Abdullah likes to read, listen to oriental music, and have fun with his little family.

Intermediate
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week