Localizing PHP Applications “The Right Way”, Part 4

This entry is part 2 of 5 in the series Localizing PHP Applications "The Right Way"

Localizing PHP Applications "The Right Way"

In Part 3 you learned some of the more important aspects of real-world localizing your application, such as using a default fallback locale and separating messages into multiple domain files depending on their usage. In this part I’ll show you what is arguably the most powerful feature of gettext – handling plural forms. The plural forms feature of gettext you enable you to localize your application perfectly and professionally.

Plural Forms

The early programmers who first wrote messages for display to the user soon ran into a problem. When using variable numbers as part of a message, the correct form of the counted noun should be used. For example, it’s proper to say “1 file was deleted,” but “30 file was deleted” is not. The word “file” should appear in its plural form since 30 is a plural value in English.

An if statement is often a workaround for languages with simple grammatical cases, such as English:

<?php
if ($i == 1) {
    echo $i . " file deleted.";
}
else {
    echo $i . " files deleted.";
}

If all languages followed the same grammatical numbers as English then such code would probably be sufficient. Handling plural forms might be a nuisance, but not much thought would have been given to the problem. But many languages have a richer set of grammatical numbers. In Polish, for example, plural forms differ for various numbers:

1 plik (file)
2-4 pliki
5-21 plików
22-24 pliki 
25-31 plików
...

Another example is Arabic which has at least 6 rules to define the plural forms of nouns. Obviously a hard-coded if statement is not a sustainable solution in an application that targets multiple locales!

Luckily gettext’s approach to the problem is much cleaner. First, you provide a rule that specifies plural forms in the header of the domain and then you provide translation messages for each of the possible plural forms. gettext then uses the rule to determine which is the correct translation to display.

Plural Forms in Action

To be able to use plural forms inside your domain, you need to write a rule for dealing with plurals for the catalog’s target language; this rule will be included in the header of the PO file telling gettext how to handle plural forms when looking up translations.

Create a new directory for the Polish locale (pl_PL), create the LC_MESSAGES directory and the messages.po as outlined in previous parts of this article so that now your Hello World script is Polish-enabled.

Open messages.po using a text editor and append the following line to its header:

"Plural-Forms: nplurals=3;plural=n==1 ? 0 : n%10>=2 && n%10< =4 && (n%100<10 || n%100>=20) ? 1 : 2;n"

and append the following to the messages lines:

msgid "%d file deleted."
msgid_plural "%d files deleted."
msgstr[0] "%d plik został usunięty."
msgstr[1] "%d pliki zostały usunięte."
msgstr[2] "%d plików zostało usuniętych."

Save the file, open it in Poedit, compile and close it. Then, go to test-locale.php and append the following lines:

<?php
echo "<br>";
echo ngettext("%d file deleted.", "%d files deleted.", 1);
echo "<br>";
echo ngettext("%d file deleted.", "%d files deleted.", 2);
echo "<br>";
echo ngettext("%d file deleted.", "%d files deleted.", 5);

Now when you run the script with the parameter lang=pl_PL you should see the last three lines displayed in Polish:

1 plik został usuniętye.
2 pliki zostały usunięte.
5 plików zostało usuniętych.

The Plural Forms Rule

Let’s take a closer look at the plural forms rule added to the header section:

"Plural-Forms: nplurals=3;plural=n==1 ? 0 : n%10>=2 && n%10< =4 && (n%100<10 || n%100>=20) ? 1 : 2;n"

The rule starts with the label “Plural-Forms:” which tells gettext the rule that follows should be used to decide which form to use given the number n. “nplurals=3″ tells gettext that there are three plural forms for each noun in this locale. A semicolon then separates this from the rest of the rule.

The next part of the rule starts with “plural=” and is an expression describing the selection criteria for the plural forms. It uses typical C-language syntax with a few exceptions: no negative numbers are allowed, numbers must be decimal, and the only variable allowed is n. Spaces are allowed in the expression, but backslash-escaped newlines are not.

The provided expression will be evaluated whenever one of the functions ngettext(), dngettext(), or dcngettext() is called. The numeric value passed to these functions is then substituted for all uses of the variable n. The resulting value must be greater or equal to zero but smaller than the value given as the value of nplurals. For example, if nplurals states there are three plural forms, the expression for plurals shouldn’t yield a value of 5.

The final integer calculated by the plurals expression specifies which plural form to choose for a particular n value, so if the returned integer from the expression is 0 when substituting n=1 for instance, then the plural form “0” will be used, if it equals 1, the plural form “1” will be used, and so forth.

To say the plural forms rule in words: if n is equal to 1, form “0” will be used which is plik. Otherwise if n divided by 10 returns something in the inclusive range of 2-4 and if n divided by 100 returns something less than 10 or greater than or equal to 20, form “1” will be used which is pliki, else use form “2” which is plików.

You can find plural forms rules for various language families, and more about how to translate the plural forms in your domains, by reading the GNU gettext documentation. Specifically, the Plural Forms and Translating Plural Forms sections will be the most helpful.

Summary

In this part of the localization series you saw one of the most powerful features of gettext. Programmers who may try to write their own inline if statements each time they need to present a pluralized message will soon find the approach cumbersome and fragile. Gettext abstracts the logic so you can keep your code clean. All you need to do is determine the correct expression to calculate the plural forms index for each language you translate, provide the msgid and msgid_plural pair and then a set of indexed msgstrs that gettext can choose from and gettext takes care of the rest.

The next (and final part) of the series will show you how to automate the process of extracting msgid strings from your PHP code into a template file from which you can generate your individual locale’s catalogs.

Image via sgame / Shutterstock

Localizing PHP Applications "The Right Way"

<< Localizing PHP Applications “The Right Way”, Part 1<< Localizing PHP Applications “The Right Way”, Part 2

Free book: Jump Start HTML5 Basics

Grab a free copy of one our latest ebooks! Packed with hints and tips on HTML5's most powerful new features.

  • http://WebsiteURL Florian

    No way…
    You would never write <?php if ($i == 1) {echo $i . " file deleted.";}else{echo $i . " files deleted.";}
    You would use <?php echo $i . ' file' . ($i == 1 ? 's' : '') . ' deleted';

    • Philipp

      @Florian Actually using ternary operators is considered bad practice.
      http://en.wikipedia.org/wiki/%3F:#.3F:_in_style_guidelines

      • John

        Actually, it’s believing wikipedia on software topics which is bad practice. Of course they’d use ternary select there; that’s what it’s for.

  • http://kojidesign.co.uk Pawel

    Hi,
    I just want to thank you for great article and correct couple things:)

    For example this part:

    msgid “%d file deleted.”
    msgid_plural “%d files deleted.”
    msgstr[0] “%d plik został usunięty.”
    msgstr[1] “%d pliki zostały usunięte.”
    msgstr[2] “%d plików zostały usunięte.” <- here should be msgstr[2] "%d plików zostało usuniętych." – it's Polish grammar:)

    Whole rest is ok:)
    Great job

    • http://zaemis.blogspot.com Timothy Boronczyk

      I’ve corrected the appearances in the article. Dziękuję!