Localizing PHP Applications “The Right Way”, Part 5

This entry is part 4 of 5 in the series Localizing PHP Applications "The Right Way"

Localizing PHP Applications "The Right Way"

In Part 4 you learned how to use gettext for one of the most complex aspects of localization a developer can face, plural forms. In this, the final part of the five-part series, I’ll teach you how you to automate part of the localization process by extracting msgids and generating a PO template file (.pot) from your application’s PHP code. Let’s dive right in!

Extracting Strings from Source

You’ve seen how powerful gettext is, and how easy it was to incorporate localization into your applications. But what about ongoing maintenance? As your application matures, text strings are sure to be added, updated, and removed. Extracting strings for use as msgids and organizing them by hand is a daunting task, even with just a small codebase. Here’s where xgettext can help.

xgettext is a command-line tool that is part of the gettext library which you downloaded and installed in Part 1… a very useful tool indeed! Its purpose is to simplify the extraction of strings from your source code and generate a domain template, thereby saving you time and hassle. xgettext is not PHP specific; you can use it to extract strings from code written in over 15 popular programming languages, including C, C++, C#, Java, Perl, PHP, and Python to name just a few.

Before you begin, make sure your Test18N directory is up to date with the following structure created in the previous parts of this series.

another directory recap

Open test_locale.php and replace its contents with the following code:

<?php
require_once "locale.php";

echo _("Hello World!") . "<br>";
echo _("Testing Translation...") . "<br>";
echo _("Please login first:") . "<br>";
echo _("Click on the link below") . "<br>";
echo _("Shutdown system") . "<br>";

echo '<a href="test_page_1.php">' . _("Go To Page 1") . "</a>";

Then, create a new file named test_page_1.php with the following contents:

<?php
require_once "locale.php";

echo _("Errors occurred") . "<br>";
echo _("Please fix this") . "<br>";
echo _("Click on the link below") . "<br>";

echo dgettext("errors", "Error getting content") . "<br>";
echo dgettext("errors", "Error saving data") . "<br>";

echo '<a href="test_locale.php">' . _("Back To Home") . '</a>&nbsp;|&nbsp;<a href="test_page_2.php">' . _("Go To Page 2") . "</a>";

And finally, create a new file named test_page_2.php with the following contents:

<?php
require_once "locale.php";

echo _("If you want to read more") . "<br>";
echo _("Please login first:") . "<br>";

echo sprintf(ngettext("%d file", "%d files", 1), 1) . "<br>";
echo sprintf(ngettext("%d file", "%d files", 2), 2) . "<br>";
echo sprintf(ngettext("%d file", "%d files", 5), 5) . "<br>";

echo '<a href="test_locale.php">' . _("Back To Home") . '</a>&nbsp;|&nbsp;<a href="test_page_1.php">' . _("Go To Page 1") . "</a>";

Now you should have three files to emulate a slightly larger application. And like a real-world app would have, you’ll notice that some messages are repeated in more than one file. If you were to extract the strings by hand, you’d have to sort them and remove any duplicates when creating your translation file.

Now for the magic of automation. Open a terminal window, go to the Test18N directory, and run the following command:

abouzekry@sandbox:~/htdocs/Test18N$ xgettext --from-code=UTF-8 -o messages.pot *.php

This instructs xgettext to extract messages from all the PHP files in the current directory. xgettext assumes any file is ASCII by default, and its output may contain unexpected results if the source strings contain any non-ASCII characters. To be on the safe side I’ve overridden its assumption with UTF-8 using the --from-code option. The -o option instructs xgettext to write its output to a file named messages.pot, the base file you’ll be using shortly for all your translations.

Before going any further, it’s worth noting that xgettext has some limitations. Most noticeably it writes all the strings to a single file (test_page_1.php for example uses dgettext() to look up some of the translations in the errors domain, but the strings have all been put into messages.pot). You can either use only a single domain, or you can split messages.pot afterwards into the appropriate files. I really like having specialized translation domains to keep everything organized, so this is the approach I will encourage here.

Copy the file messages.pot as errors.pot and edit errors.pot to remove all of the messages except those relating to the errors domain. You should keep only the following messages:

#: test_page_1.php:9
msgid "Error getting content"
msgstr ""

#: test_page_1.php:10
msgid "Error saving data"
msgstr ""

Then, edit messages.pot to remove the error-related messages from that file.

Using the Templates

Now you have two templates to start with, one for the messages domain (messages.pot) and another for the errors domain (domain.pot). Start Poedit, choose File > New catalog from POT file, and open messages.pot.

poedit menu

Fill the necessary parameters as outlined before in Part 2. I will be creating a French language catalog using the UTF-8 encoding. It’s also important to specify the appropriate plural forms expression, too. For French, it is “nplurals=2; plural=n>1;”. After you click OK, you’ll be asked to save the new PO file created from the template. Save it as the corresponding Locale/fr_FR/LC_MESSAGES/messages.po file.

poedit settings window

Poedit then opens the PO file and displays the original strings and their translations. You can add your translations directly in Poedit, or send the file off to your translator to work on while you focus on your application’s PHP code.

As a side note, dealing with plural forms in Poedit’s interface is easy. When you click on the singular form in the translation list, you’ll see tabs at the bottom for each form in which you can input appropriate translations.

poedit

Once you’re finished providing the translations for each msgid, choose File > Save or click the Save Catalog entry in the icon bar to save and generate the necessary MO file. Then do the same procedure for errors.pot, saving it to Locale/fr_FR/LC_MESSAGES/errors.po. You’ll need to repeat the process for each template for each language you have.

When at least the French locale’s MO files are in place, test the test-locale.php script to make sure everything is working.

Summary

In this last part of the series, you learned how to extract translation strings automatically from your PHP source files using the xgettext tool, generating a PO template file. The template can then be used for generating any target domain catalogs you need, thus leaving the cumbersome process of messages extraction to the computer.

Throughout the five parts you’ve learned how localization can be just a matter of writing separate translation files for a target locale, and then referenced using gettext(), its shorthand alias _(), and its plural counterpart ngettext(). You’ve also seen how taking advantage of gettext’s fallback behavior can lead to more readable code and translation catalogs, and how translations can be neatly organized into their own domains (messages.po for general messages, errors.po for error strings, etc.).

I’ve enjoyed writing this series and want to thank you for taking the time to learn how to localize your PHP applications “the right way” with gettext. gettext really is a wonderful open-source tool that helps make your life easier by allowing you to concentrate on your code.

Image via sgame / Shutterstock

Localizing PHP Applications "The Right Way"

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://artur.ejsmont.org/blog/ Ejsmont

    Hi there, thanks for sharing.

    If i was to recommend anything i think it would be to use higher level libraries like zend framework translate etc.

    From my experience it is quite nice to have language agnostic strings like BTN_ACTIVATE_ACCOUNT so you can easily search for them and change in English version does not require updating all the other languages.

    Cheers

  • Tanoor

    This is simply the best tutorial I’ve ever found on gettext.
    Good work !
    Poedit offers the possibility to extract the gettext strings.
    You did not mention it.

    Anyway, thanks for your work.

  • Jose

    Simply and clear.
    Thanks, it was really helpful