In Part 4 you learned how to use gettext for one of the most complex aspects of localization a developer can face, plural forms. In this, the final part of the five-part series, I’ll teach you how you to automate part of the localization process by extracting msgids and generating a PO template file (
.pot) from your application’s PHP code. Let’s dive right in!
Extracting Strings from Source
You’ve seen how powerful gettext is, and how easy it was to incorporate localization into your applications. But what about ongoing maintenance? As your application matures, text strings are sure to be added, updated, and removed. Extracting strings for use as msgids and organizing them by hand is a daunting task, even with just a small codebase. Here’s where xgettext can help.
xgettext is a command-line tool that is part of the gettext library which you downloaded and installed in Part 1… a very useful tool indeed! Its purpose is to simplify the extraction of strings from your source code and generate a domain template, thereby saving you time and hassle. xgettext is not PHP specific; you can use it to extract strings from code written in over 15 popular programming languages, including C, C++, C#, Java, Perl, PHP, and Python to name just a few.
Before you begin, make sure your Test18N directory is up to date with the following structure created in the previous parts of this series.
test_locale.php and replace its contents with the following code:
<?php require_once "locale.php"; echo _("Hello World!") . "<br>"; echo _("Testing Translation...") . "<br>"; echo _("Please login first:") . "<br>"; echo _("Click on the link below") . "<br>"; echo _("Shutdown system") . "<br>"; echo '<a href="test_page_1.php">' . _("Go To Page 1") . "</a>";
Then, create a new file named
test_page_1.php with the following contents:
<?php require_once "locale.php"; echo _("Errors occurred") . "<br>"; echo _("Please fix this") . "<br>"; echo _("Click on the link below") . "<br>"; echo dgettext("errors", "Error getting content") . "<br>"; echo dgettext("errors", "Error saving data") . "<br>"; echo '<a href="test_locale.php">' . _("Back To Home") . '</a> | <a href="test_page_2.php">' . _("Go To Page 2") . "</a>";
And finally, create a new file named
test_page_2.php with the following contents:
<?php require_once "locale.php"; echo _("If you want to read more") . "<br>"; echo _("Please login first:") . "<br>"; echo sprintf(ngettext("%d file", "%d files", 1), 1) . "<br>"; echo sprintf(ngettext("%d file", "%d files", 2), 2) . "<br>"; echo sprintf(ngettext("%d file", "%d files", 5), 5) . "<br>"; echo '<a href="test_locale.php">' . _("Back To Home") . '</a> | <a href="test_page_1.php">' . _("Go To Page 1") . "</a>";
Now you should have three files to emulate a slightly larger application. And like a real-world app would have, you’ll notice that some messages are repeated in more than one file. If you were to extract the strings by hand, you’d have to sort them and remove any duplicates when creating your translation file.
Now for the magic of automation. Open a terminal window, go to the
Test18N directory, and run the following command:
abouzekry@sandbox:~/htdocs/Test18N$ xgettext --from-code=UTF-8 -o messages.pot *.php
This instructs xgettext to extract messages from all the PHP files in the current directory. xgettext assumes any file is ASCII by default, and its output may contain unexpected results if the source strings contain any non-ASCII characters. To be on the safe side I’ve overridden its assumption with UTF-8 using the
--from-code option. The
-o option instructs xgettext to write its output to a file named
messages.pot, the base file you’ll be using shortly for all your translations.
Before going any further, it’s worth noting that xgettext has some limitations. Most noticeably it writes all the strings to a single file (
test_page_1.php for example uses
dgettext() to look up some of the translations in the errors domain, but the strings have all been put into
messages.pot). You can either use only a single domain, or you can split
messages.pot afterwards into the appropriate files. I really like having specialized translation domains to keep everything organized, so this is the approach I will encourage here.
Copy the file
errors.pot and edit
errors.pot to remove all of the messages except those relating to the errors domain. You should keep only the following messages:
#: test_page_1.php:9 msgid "Error getting content" msgstr "" #: test_page_1.php:10 msgid "Error saving data" msgstr ""
messages.pot to remove the error-related messages from that file.
Using the Templates
Now you have two templates to start with, one for the messages domain (
messages.pot) and another for the errors domain (
domain.pot). Start Poedit, choose
New catalog from POT file, and open
Fill the necessary parameters as outlined before in Part 2. I will be creating a French language catalog using the UTF-8 encoding. It’s also important to specify the appropriate plural forms expression, too. For French, it is “nplurals=2; plural=n>1;”. After you click OK, you’ll be asked to save the new PO file created from the template. Save it as the corresponding
Poedit then opens the PO file and displays the original strings and their translations. You can add your translations directly in Poedit, or send the file off to your translator to work on while you focus on your application’s PHP code.
As a side note, dealing with plural forms in Poedit’s interface is easy. When you click on the singular form in the translation list, you’ll see tabs at the bottom for each form in which you can input appropriate translations.
Once you’re finished providing the translations for each msgid, choose
Save or click the
Save Catalog entry in the icon bar to save and generate the necessary MO file. Then do the same procedure for
errors.pot, saving it to
Locale/fr_FR/LC_MESSAGES/errors.po. You’ll need to repeat the process for each template for each language you have.
When at least the French locale’s MO files are in place, test the
test-locale.php script to make sure everything is working.
In this last part of the series, you learned how to extract translation strings automatically from your PHP source files using the xgettext tool, generating a PO template file. The template can then be used for generating any target domain catalogs you need, thus leaving the cumbersome process of messages extraction to the computer.
Throughout the five parts you’ve learned how localization can be just a matter of writing separate translation files for a target locale, and then referenced using
gettext(), its shorthand alias
_(), and its plural counterpart
ngettext(). You’ve also seen how taking advantage of gettext’s fallback behavior can lead to more readable code and translation catalogs, and how translations can be neatly organized into their own domains (
messages.po for general messages,
errors.po for error strings, etc.).
I’ve enjoyed writing this series and want to thank you for taking the time to learn how to localize your PHP applications “the right way” with gettext. gettext really is a wonderful open-source tool that helps make your life easier by allowing you to concentrate on your code.
Abdullah Abouzekry is an experienced web-developer with over 7 years developing PHP/MySQL applications ranging from simple web sites to extensive web-based business applications. Although his main experience is with PHP/MySQL and related web technologies, he has developed and localized many desktop applications in C#, Python/Qt, Java, and C++. When not writing code, Abdullah likes to read, listen to oriental music, and have fun with his little family.