Localization Demystified: Php-Intl for Everyone

Younes Rafie
Share

Globe stock illustration

Most applications perform locale aware operations like working with texts, dates, timezones, etc. The PHP Intl extension provides a good API for accessing the widely known ICU library’s functions.

Installation

The extension is installed by default on PHP 5.3 and above. You can look for it by running the following command:

php -m | grep 'intl'

If the extension is not present, you can install it manually by following the installation guide. If you’re using Ubuntu, you can directly run the following commands.

sudo apt-get update
sudo apt-get install php5-intl

If you’re using PHP7 on your machine, you need to add the (ppa:ondrej/php) PPA, update your system and install the Intl extension.

# Add PPA
sudo add-apt-repository ppa:ondrej/php-7.0
# Update repository index
sudo apt-get update
# install extension
sudo apt-get install php7.0-intl

Message Formatting

Most modern applications are built with localization in mind. Sometimes, the message is a plain string with variable placeholders, other times it’s a complex pluralized string.

Simple Messages

We’re going to start with a simple message containing a placeholder. Placeholders are patterns enclosed in curly braces. Here is an example:

var_dump(
    MessageFormatter::formatMessage(
        "en_US",
        "I have {0, number, integer} apples.",
        [ 3 ]
    )
);
// output

string(16) "I have 3 apples."

The arguments passed to the MessageFormatter::formatMessage method are:

  • The message locale.
  • String message.
  • Placeholder data.

The {0, number, integer} placeholder will inject the first item of the data array as a numberinteger (see the table below for the list of options). We can also use named arguments for placeholders. The example below will output the same result.

var_dump(
    MessageFormatter::formatMessage(
        "en_US",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);

Different languages have different numeral systems, like Arabic, indian, etc.

Arabic numerals

The previous example is targeting the en_US locale. Let’s change it to ar to see the difference.

var_dump(
    MessageFormatter::formatMessage(
        "ar",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);
string(17) "I have ٣ apples."

We can also change it to Bengali locale (bn).

var_dump(
    MessageFormatter::formatMessage(
        "bn",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);
string(18) "I have ৩ apples."

So far, we’ve only worked with numbers. Let’s take a look at other types that we can use.

$time = time();
var_dump( MessageFormatter::formatMessage(
    "en_US",
    "Today is {0, date, full} - {0, time}",
    array( $time )
) );
string(47) "Today is Wednesday, April 6, 2016 - 11:21:47 PM"
var_dump( MessageFormatter::formatMessage(
    "en_US",
    "duration: {0, duration}",
    array( $time )
) );
string(23) "duration: 405,551:27:58"

We can also spell out the passed numbers.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "I have {0, spellout} apples",
    array( 34 )
) );
string(25) "I have thirty-four apples"

It also works on different locales. Here is an example using the Arabic language.

var_dump( MessageFormatter::formatMessage(
    "ar",
    "لدي {0, spellout} تفاحة",
    array( 34 )
) );
string(44) "لدي أربعة و ثلاثون تفاحة"
argType argStyle
number integer, currency, percent
date short, medium, long, full
time short, medium, long, full
spellout short, medium, long, full
ordinal
duration

Pluralization

An important part of localizing our application is to manage plural messages to make our UI as intuitive as possible. The apples example above will do the job. Here’s how messages should look like in this case.

  • (number_apples = 0): I have no apples.
  • (number_apples = 1): I have one apple.
  • (number_apples > 1): I have X apples.
var_dump( MessageFormatter::formatMessage(
    "en_US",
    'I have {number_apples, plural, =0{no apples} =1{one apple} other{# apples}}',
    array('number_apples' => 10)
) );
// number_apples = 0
string(16) "I have no apples"

// number_apples = 1
string(16) "I have one apple"

// number_apples = 10
string(16) "I have 10 apples"

The syntax is really straightforward, and most pluralization packages adopt this syntax. Check the documentation for more details.

{data, plural, offsetValue =value{message}... other{message}}
  • data: value index.
  • plural: argType.
  • offsetValue: the offset value is optional(offset:value). It subtracts the offset from the value.
  • =value{message}: value to test for equality, and the message between curly braces. We can repeat this part multiple times (=0{no apples} =1{one apple} =2{two apple}).
  • other{message}: The default case, like in a switch - case statement. The # character may be used the inject the data value.

Choices

In some cases, we need to print a different message for every range. The example below does this.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    'The value of {0,number} is {0, choice,
                                        0 # between 0 and 19 |
                                        20 # between 20 and 39 |
                                        40 # between 40 and 59 |
                                        60 # between 60 and 79 |
                                        80 # between 80 and 100 |
                                        100 < more than 100 }',
    array(60)
) );
string(38) "The value of 60 is between 60 and 79 "

The argType in this case is set to choice, and this is the syntax format:

{value, choice, choiceStyle}

The official definition from the ICU documentation is:

choiceStyle = number separator message ('|' number separator message)*

number = normal_number | ['-']  ∞ (U+221E, infinity)
normal_number = double value (unlocalized ASCII string)

separator = less_than | less_than_or_equal
less_than = '<'
less_than_or_equal = '#' |  ≤ (U+2264)

Note: ICU developers discourage the use of the choice type.

Select

Sometimes we need something like the select option UI component. Profile pages use this to update the UI messages according to the user’s gender, etc. Here’s an example:

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender, select, ".
      "female {She has some apples} ".
      "male {He has some apples.}".
      "other {It has some apples.}".
    "}",
    array('gender' => 'female')
) );
string(19) "She has some apples"

The pattern is defined as follows:

{value, select, selectStyle}

// selectStyle
selectValue {message} (selectValue {message})*

The message argument may contain other patterns like choice and plural. The next part will explain a complex example where we combine multiple patterns. Check the ICU documentation for more details.

Complex Cases

So far, we’ve seen some simple examples like pluralization, select, etc. Some cases are more complex than others. The ICU documentation has a very good example illustrating this. We’ll insert part by part to make it simpler to grasp.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {She has a party} ".
      "male {He has some apples.}".
      "other {He has some apples.}".
    "}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

This is the same example we used before, but instead of using a simple message, we customize it depending on the num_guests value (talking about pluralization here).

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to her party.}".
          "=2 {{host} invites {guest} and one other person to her party.}".
          "other {{host} invites {guest} and # other people to her party.}}}".
      "male {He has some apples.}".
      "other {He has some apples.}}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

Notice that we’re using the offset:1 to remove one guest from the num_guests value.

string(53) "Hanae invites Younes and 4 other people to her party."

Here’s the full snippet of this example.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to her party.}".
          "=2 {{host} invites {guest} and one other person to her party.}".
          "other {{host} invites {guest} and # other people to her party.}}}".
      "male {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to his party.}".
          "=2 {{host} invites {guest} and one other person to his party.}".
          "other {{host} invites {guest} and # other people to his party.}}}".
      "other {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to their party.}".
          "=2 {{host} invites {guest} and one other person to their party.}".
          "other {{host} invites {guest} and # other people to their party.}}}}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

Change the number of guests to test all message types.

// num_guests = 2
string(55) "Hanae invites Younes and one other person to her party."

// num_guests = 1
string(34) "Hanae invites Younes to her party."

// num_guests = 0
string(28) "Hanae does not have a party."

Message Parsing

There’s not much to say about parsing messages; we use the pattern we used for formatting to extract data from an output message.

$messageFormater = new MessageFormatter("en_US", 'I have {0, number}');
var_dump( $messageFormater->parse("I have 10 apples") );
array(1) {
  [0]=>
  int(10)
}

Check the documentation for more details about message parsing.

Conclusion

In this introductory post, we learned about localizing our messages using the PHP Intl extension. The next part will cover formatting numbers and dates, and how to work with calendars. If you have any questions about what we’ve covered so far, you can post them in the comments below!