Welcome back to this series of articles which teach you how to localize your PHP applications using gettext and its PHP extension. In Part 1 you took your first steps towards towards this by installing gettext and Poedit, creating a translation file, and writing a Hello World script. In this part you’ll lean about each of the function used in the script, and dive more into the gettext library and its usage.
The “Hello World” Script
To review, Part 1 showed you the following script as TestI18N/test-locale.php
:
<?php
// I18N support information here
$language = "en_US";
putenv("LANG=" . $language);
setlocale(LC_ALL, $language);
// Set the text domain as "messages" to
// use Locale/en_US/LC_MESSAGES/messages.mo
$domain = "messages";
bindtextdomain($domain, "Locale");
bind_textdomain_codeset($domain, "UTF-8");
// Use the messages domain
textdomain($domain);
echo _("HELLO_WORLD");
Calling putenv()
and setting the LANG
environment variable instructs gettext which locale it will be using for this session. en_US
is the identifier for English as used in the United States. The first part of the locale is a two-letter lowercase abbreviation for the language according to the ISO 639-1 specification, and the second part is a two-letter uppercase country code according to the ISO 3166-1 alpha-2 specification. setlocale()
specifies the locale used in the application and affects how PHP sorts strings, understands date and time formatting, and formats numeric values.
gettext calls the catalog file used to store the translation messages (the MO file) a domain. The bindtextdomain()
function tells gettext where to find the domain to use; the first parameter is the catalog name without the .mo
extension, and the second parameter is the path to the parent directory in which the en_US/LC_MESSAGES
subpath resides (which in turn is where the translation file resides). If you’re wondering where the subpath en_US/LC_MESSAGES
comes from, it is constructed by gettext using the values of the LANG
variable you specified using putenv()
and the locale category LC_MESSAGES
. You can call bindtextdomain()
several times to bind as many domains as you want, in the event you’ve split your translations up throughout multiple files.
Calling bind_textdomain_codeset()
is very important because not doing so can lead to unexpected characters in your output when using non-ASCII letters. Since the catalog messages are encoded in UTF-8, that is what the example code sets as the codeset. I always recommend using UTF-8 as is the most widely supported Unicode encoding. Don’t use other less-known encodings unless you know exactly what you are doing; you will encounter serious problems, especially on the web.
The call textdomain()
tells gettext which domain to use for any subsequent calls to gettext()
, or its shorthand alias _()
, or its plural form lookup method ngettext()
. I’ll talk about dealing with plural forms in the next installment, but for now you should know that all three of these methods lookup messages in the current domain specified with textdomain()
.
Lastly, the script calls _()
, which looks up the msgid HELLO_WORLD
in the messages.mo
file and returns the msgstr associated with it, the text Hello World!
Missing Translation Strings
Now that you have a basic understanding of how this simple script looks up replacements for translations, try changing the domain.
<?php
$language = "en_US";
putenv("LANG=" . $language);
setlocale(LC_ALL, $language);
$domain = "foo";
bindtextdomain($domain, "Locale");
bind_textdomain_codeset($domain, "UTF-8");
// ...
gettext will try to look up the catalog Locale/en_US/LC_MESSAGES/foo.mo
, which shouldn’t exist.
When you view the script’s output you’ll see HELLO_WORLD instead of the Hello World! gettext can’t perform a translation because there isn’t a valid catalog, though another scenario might be the given msgid might not exist in any catalogs registered with gettext, and it is smart enough to use the original string you supplied.
Targeting Multiple Locales
In a real-world application, you will typically use your target language’s strings as the IDs throughout your code. This makes the code a bit clearer and the fallback of a translation failure more user friendly. For example, if your application uses English and French as the target languages, you can use English as the ID strings and then create French catalogs to replace the English.
In the same TestI18N/Locale
directory, create a new directory named fr_FR
containing another LC_MESSAGES
directory, and use the procedures outlined in Part 1 to create a new catalog for French. When you’re finished, you should have the following hierarchy:
When you specify the catalog settings in Poedit, remember to set French as the language and France as the country.
My French messages.po
will look like this when opened in a text editor:
msgid "" msgstr "" "Project-Id-Version: TestProjectn" "POT-Creation-Date: n" "PO-Revision-Date: n" "Last-Translator: FIRSTNAME LASTNAME <email@example.com>n" "Language-Team: MyTeam <team@example.com>n" "MIME-Version: 1.0n" "Content-Type: text/plain; charset=utf-8n" "Content-Transfer-Encoding: 8bitn" "X-Poedit-Language: Frenchn" "X-Poedit-Country: FRANCEn" "X-Poedit-SourceCharset: utf-8n" #Test token 1 msgid "HELLO_WORLD" msgstr "Bonjour tout le monde!" #Test token 2 msgid "TEST_TRANSLATION" msgstr "Test de traduction..."
Most of the header lines of the file are self explanatory, so I’ll skip right to the actual translation lines which start with the first msgid after the headers. Notice that there are two strings for each phrase to be translate, the msgid which is the ID string in your code gettext will look up, and the msgstr which is the translated message which gettext will substitute for the ID. The first definition instructs gettext to use Bonjour tout le monde! whenever it sees HELLO_WORLD
. The second instructs gettext to use Test de traduction… for TEST_TRANSLATION
.
Open the catalog file again in Poedit and click the Save Catalog entry in the icon bar to save and compile it. Then modify the PHP script to use fr_FR
instead of en_US
. When you run it, you’ll see the output in your browser is now French!
Summary
In this part you learned what each function call does in the Hello World script introduced in Part 1. In terms of its API, gettext isn’t really a large library. There are only a handful of functions, most of which you will only use once in your entire application. The most frequently used will be gettext()
, or it’s shorthand alias _()
, and its plural form equivalent ngettext()
. You also learned how to target multiple Locales (en_US
and fr_FR
in our example), and how gettext falls back to the msgid when its missing a translation.
In the next part you’ll see how to start doing real world localization by organizing the directories, switching between languages, choosing a fallback language, and overriding the current selected messages domain.
Image via sgame / Shutterstock
Abdullah Abouzekry is an experienced web-developer with over 7 years developing PHP/MySQL applications ranging from simple web sites to extensive web-based business applications. Although his main experience is with PHP/MySQL and related web technologies, he has developed and localized many desktop applications in C#, Python/Qt, Java, and C++. When not writing code, Abdullah likes to read, listen to oriental music, and have fun with his little family.