Localizing PHP Applications “The Right Way”, Part 2
Welcome back to this series of articles which teach you how to localize your PHP applications using gettext and its PHP extension. In Part 1 you took your first steps towards towards this by installing gettext and Poedit, creating a translation file, and writing a Hello World script. In this part you’ll lean about each of the function used in the script, and dive more into the gettext library and its usage.
The “Hello World” Script
To review, Part 1 showed you the following script as
<?php // I18N support information here $language = "en_US"; putenv("LANG=" . $language); setlocale(LC_ALL, $language); // Set the text domain as "messages" to // use Locale/en_US/LC_MESSAGES/messages.mo $domain = "messages"; bindtextdomain($domain, "Locale"); bind_textdomain_codeset($domain, "UTF-8"); // Use the messages domain textdomain($domain); echo _("HELLO_WORLD");
putenv() and setting the
LANG environment variable instructs gettext which locale it will be using for this session.
en_US is the identifier for English as used in the United States. The first part of the locale is a two-letter lowercase abbreviation for the language according to the ISO 639-1 specification, and the second part is a two-letter uppercase country code according to the ISO 3166-1 alpha-2 specification.
setlocale() specifies the locale used in the application and affects how PHP sorts strings, understands date and time formatting, and formats numeric values.
gettext calls the catalog file used to store the translation messages (the MO file) a domain. The
bindtextdomain() function tells gettext where to find the domain to use; the first parameter is the catalog name without the
.mo extension, and the second parameter is the path to the parent directory in which the
en_US/LC_MESSAGES subpath resides (which in turn is where the translation file resides). If you’re wondering where the subpath
en_US/LC_MESSAGES comes from, it is constructed by gettext using the values of the
LANG variable you specified using
putenv() and the locale category
LC_MESSAGES. You can call
bindtextdomain() several times to bind as many domains as you want, in the event you’ve split your translations up throughout multiple files.
bind_textdomain_codeset() is very important because not doing so can lead to unexpected characters in your output when using non-ASCII letters. Since the catalog messages are encoded in UTF-8, that is what the example code sets as the codeset. I always recommend using UTF-8 as is the most widely supported Unicode encoding. Don’t use other less-known encodings unless you know exactly what you are doing; you will encounter serious problems, especially on the web.
textdomain() tells gettext which domain to use for any subsequent calls to
gettext(), or its shorthand alias
_(), or its plural form lookup method
ngettext(). I’ll talk about dealing with plural forms in the next installment, but for now you should know that all three of these methods lookup messages in the current domain specified with
Lastly, the script calls
_(), which looks up the msgid
HELLO_WORLD in the
messages.mo file and returns the msgstr associated with it, the text Hello World!
Missing Translation Strings
Now that you have a basic understanding of how this simple script looks up replacements for translations, try changing the domain.
<?php $language = "en_US"; putenv("LANG=" . $language); setlocale(LC_ALL, $language); $domain = "foo"; bindtextdomain($domain, "Locale"); bind_textdomain_codeset($domain, "UTF-8"); // ...
gettext will try to look up the catalog
Locale/en_US/LC_MESSAGES/foo.mo, which shouldn’t exist.
When you view the script’s output you’ll see HELLO_WORLD instead of the Hello World! gettext can’t perform a translation because there isn’t a valid catalog, though another scenario might be the given msgid might not exist in any catalogs registered with gettext, and it is smart enough to use the original string you supplied.
Targeting Multiple Locales
In a real-world application, you will typically use your target language’s strings as the IDs throughout your code. This makes the code a bit clearer and the fallback of a translation failure more user friendly. For example, if your application uses English and French as the target languages, you can use English as the ID strings and then create French catalogs to replace the English.
In the same
TestI18N/Locale directory, create a new directory named
fr_FR containing another
LC_MESSAGES directory, and use the procedures outlined in Part 1 to create a new catalog for French. When you’re finished, you should have the following hierarchy:
When you specify the catalog settings in Poedit, remember to set French as the language and France as the country.
messages.po will look like this when opened in a text editor:
msgid "" msgstr "" "Project-Id-Version: TestProjectn" "POT-Creation-Date: n" "PO-Revision-Date: n" "Last-Translator: FIRSTNAME LASTNAME <firstname.lastname@example.org>n" "Language-Team: MyTeam <email@example.com>n" "MIME-Version: 1.0n" "Content-Type: text/plain; charset=utf-8n" "Content-Transfer-Encoding: 8bitn" "X-Poedit-Language: Frenchn" "X-Poedit-Country: FRANCEn" "X-Poedit-SourceCharset: utf-8n" #Test token 1 msgid "HELLO_WORLD" msgstr "Bonjour tout le monde!" #Test token 2 msgid "TEST_TRANSLATION" msgstr "Test de traduction..."
Most of the header lines of the file are self explanatory, so I’ll skip right to the actual translation lines which start with the first msgid after the headers. Notice that there are two strings for each phrase to be translate, the msgid which is the ID string in your code gettext will look up, and the msgstr which is the translated message which gettext will substitute for the ID. The first definition instructs gettext to use Bonjour tout le monde! whenever it sees
HELLO_WORLD. The second instructs gettext to use Test de traduction… for
Open the catalog file again in Poedit and click the Save Catalog entry in the icon bar to save and compile it. Then modify the PHP script to use
fr_FR instead of
en_US. When you run it, you’ll see the output in your browser is now French!
In this part you learned what each function call does in the Hello World script introduced in Part 1. In terms of its API, gettext isn’t really a large library. There are only a handful of functions, most of which you will only use once in your entire application. The most frequently used will be
gettext(), or it’s shorthand alias
_(), and its plural form equivalent
ngettext(). You also learned how to target multiple Locales (
fr_FR in our example), and how gettext falls back to the msgid when its missing a translation.
In the next part you’ll see how to start doing real world localization by organizing the directories, switching between languages, choosing a fallback language, and overriding the current selected messages domain.