Localization Demystified: Php-Intl for Everyone

Share this article

Globe stock illustration

Most applications perform locale aware operations like working with texts, dates, timezones, etc. The PHP Intl extension provides a good API for accessing the widely known ICU library’s functions.

Installation

The extension is installed by default on PHP 5.3 and above. You can look for it by running the following command:

php -m | grep 'intl'

If the extension is not present, you can install it manually by following the installation guide. If you’re using Ubuntu, you can directly run the following commands.

sudo apt-get update
sudo apt-get install php5-intl

If you’re using PHP7 on your machine, you need to add the (ppa:ondrej/php) PPA, update your system and install the Intl extension.

# Add PPA
sudo add-apt-repository ppa:ondrej/php-7.0
# Update repository index
sudo apt-get update
# install extension
sudo apt-get install php7.0-intl

Message Formatting

Most modern applications are built with localization in mind. Sometimes, the message is a plain string with variable placeholders, other times it’s a complex pluralized string.

Simple Messages

We’re going to start with a simple message containing a placeholder. Placeholders are patterns enclosed in curly braces. Here is an example:

var_dump(
    MessageFormatter::formatMessage(
        "en_US",
        "I have {0, number, integer} apples.",
        [ 3 ]
    )
);
// output

string(16) "I have 3 apples."

The arguments passed to the MessageFormatter::formatMessage method are:

  • The message locale.
  • String message.
  • Placeholder data.

The {0, number, integer} placeholder will inject the first item of the data array as a numberinteger (see the table below for the list of options). We can also use named arguments for placeholders. The example below will output the same result.

var_dump(
    MessageFormatter::formatMessage(
        "en_US",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);

Different languages have different numeral systems, like Arabic, indian, etc.

Arabic numerals

The previous example is targeting the en_US locale. Let’s change it to ar to see the difference.

var_dump(
    MessageFormatter::formatMessage(
        "ar",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);
string(17) "I have ٣ apples."

We can also change it to Bengali locale (bn).

var_dump(
    MessageFormatter::formatMessage(
        "bn",
        "I have {number_apples, number, integer} apples.",
        [ 'number_apples' => 3 ]
    )
);
string(18) "I have ৩ apples."

So far, we’ve only worked with numbers. Let’s take a look at other types that we can use.

$time = time();
var_dump( MessageFormatter::formatMessage(
    "en_US",
    "Today is {0, date, full} - {0, time}",
    array( $time )
) );
string(47) "Today is Wednesday, April 6, 2016 - 11:21:47 PM"
var_dump( MessageFormatter::formatMessage(
    "en_US",
    "duration: {0, duration}",
    array( $time )
) );
string(23) "duration: 405,551:27:58"

We can also spell out the passed numbers.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "I have {0, spellout} apples",
    array( 34 )
) );
string(25) "I have thirty-four apples"

It also works on different locales. Here is an example using the Arabic language.

var_dump( MessageFormatter::formatMessage(
    "ar",
    "لدي {0, spellout} تفاحة",
    array( 34 )
) );
string(44) "لدي أربعة و ثلاثون تفاحة"
argType argStyle
number integer, currency, percent
date short, medium, long, full
time short, medium, long, full
spellout short, medium, long, full
ordinal
duration

Pluralization

An important part of localizing our application is to manage plural messages to make our UI as intuitive as possible. The apples example above will do the job. Here’s how messages should look like in this case.

  • (number_apples = 0): I have no apples.
  • (number_apples = 1): I have one apple.
  • (number_apples > 1): I have X apples.
var_dump( MessageFormatter::formatMessage(
    "en_US",
    'I have {number_apples, plural, =0{no apples} =1{one apple} other{# apples}}',
    array('number_apples' => 10)
) );
// number_apples = 0
string(16) "I have no apples"

// number_apples = 1
string(16) "I have one apple"

// number_apples = 10
string(16) "I have 10 apples"

The syntax is really straightforward, and most pluralization packages adopt this syntax. Check the documentation for more details.

{data, plural, offsetValue =value{message}... other{message}}
  • data: value index.
  • plural: argType.
  • offsetValue: the offset value is optional(offset:value). It subtracts the offset from the value.
  • =value{message}: value to test for equality, and the message between curly braces. We can repeat this part multiple times (=0{no apples} =1{one apple} =2{two apple}).
  • other{message}: The default case, like in a switch - case statement. The # character may be used the inject the data value.

Choices

In some cases, we need to print a different message for every range. The example below does this.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    'The value of {0,number} is {0, choice,
                                        0 # between 0 and 19 |
                                        20 # between 20 and 39 |
                                        40 # between 40 and 59 |
                                        60 # between 60 and 79 |
                                        80 # between 80 and 100 |
                                        100 < more than 100 }',
    array(60)
) );
string(38) "The value of 60 is between 60 and 79 "

The argType in this case is set to choice, and this is the syntax format:

{value, choice, choiceStyle}

The official definition from the ICU documentation is:

choiceStyle = number separator message ('|' number separator message)*

number = normal_number | ['-']  ∞ (U+221E, infinity)
normal_number = double value (unlocalized ASCII string)

separator = less_than | less_than_or_equal
less_than = '<'
less_than_or_equal = '#' |  ≤ (U+2264)

Note: ICU developers discourage the use of the choice type.

Select

Sometimes we need something like the select option UI component. Profile pages use this to update the UI messages according to the user’s gender, etc. Here’s an example:

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender, select, ".
      "female {She has some apples} ".
      "male {He has some apples.}".
      "other {It has some apples.}".
    "}",
    array('gender' => 'female')
) );
string(19) "She has some apples"

The pattern is defined as follows:

{value, select, selectStyle}

// selectStyle
selectValue {message} (selectValue {message})*

The message argument may contain other patterns like choice and plural. The next part will explain a complex example where we combine multiple patterns. Check the ICU documentation for more details.

Complex Cases

So far, we’ve seen some simple examples like pluralization, select, etc. Some cases are more complex than others. The ICU documentation has a very good example illustrating this. We’ll insert part by part to make it simpler to grasp.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {She has a party} ".
      "male {He has some apples.}".
      "other {He has some apples.}".
    "}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

This is the same example we used before, but instead of using a simple message, we customize it depending on the num_guests value (talking about pluralization here).

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to her party.}".
          "=2 {{host} invites {guest} and one other person to her party.}".
          "other {{host} invites {guest} and # other people to her party.}}}".
      "male {He has some apples.}".
      "other {He has some apples.}}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

Notice that we’re using the offset:1 to remove one guest from the num_guests value.

string(53) "Hanae invites Younes and 4 other people to her party."

Here’s the full snippet of this example.

var_dump( MessageFormatter::formatMessage(
    "en_US",
    "{gender_of_host, select, ".
      "female {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to her party.}".
          "=2 {{host} invites {guest} and one other person to her party.}".
          "other {{host} invites {guest} and # other people to her party.}}}".
      "male {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to his party.}".
          "=2 {{host} invites {guest} and one other person to his party.}".
          "other {{host} invites {guest} and # other people to his party.}}}".
      "other {".
        "{num_guests, plural, offset:1 ".
          "=0 {{host} does not have a party.}".
          "=1 {{host} invites {guest} to their party.}".
          "=2 {{host} invites {guest} and one other person to their party.}".
          "other {{host} invites {guest} and # other people to their party.}}}}",
    array('gender_of_host' => 'female', "num_guests" => 5, 'host' => "Hanae", 'guest' => 'Younes' )
) );

Change the number of guests to test all message types.

// num_guests = 2
string(55) "Hanae invites Younes and one other person to her party."

// num_guests = 1
string(34) "Hanae invites Younes to her party."

// num_guests = 0
string(28) "Hanae does not have a party."

Message Parsing

There’s not much to say about parsing messages; we use the pattern we used for formatting to extract data from an output message.

$messageFormater = new MessageFormatter("en_US", 'I have {0, number}');
var_dump( $messageFormater->parse("I have 10 apples") );
array(1) {
  [0]=>
  int(10)
}

Check the documentation for more details about message parsing.

Conclusion

In this introductory post, we learned about localizing our messages using the PHP Intl extension. The next part will cover formatting numbers and dates, and how to work with calendars. If you have any questions about what we’ve covered so far, you can post them in the comments below!

Frequently Asked Questions (FAQs) on PHP Localization and Intl

What is the role of the PHP Intl extension in localization?

The PHP Intl extension, also known as Internationalization extension, is a wrapper for ICU library (International Components for Unicode), enabling PHP programmers to perform ULocale sensitive operations. These operations include formatting and parsing of dates, times, numbers, and currencies, message translation, and even complex operations like text segmentation or transliteration. It’s a powerful tool that helps developers build applications that can support multiple languages and cultural conventions, making them accessible to a global audience.

How do I install and enable the PHP Intl extension?

The PHP Intl extension is not enabled by default. To install it, you need to have the PECL extension and ICU library installed on your server. Once these are installed, you can use the command “pecl install intl” in your terminal. After the installation, you need to add “extension=intl.so” to your php.ini file to enable it. Remember to restart your server after these changes for them to take effect.

How does PHP localization handle date and time formatting?

PHP localization uses the IntlDateFormatter class to handle date and time formatting. This class allows you to format dates and times according to the locale’s rules. For example, you can use the format method to format a date or time, and the parse method to convert a string back into a timestamp. This ensures that dates and times are displayed in a way that is familiar to the user, regardless of their location.

How can I use PHP Intl for number formatting?

The IntlNumberFormatter class in the PHP Intl extension is used for number formatting. It provides methods for formatting numbers, currencies, and percentages according to locale rules. You can use the format method to format a number, and the parse method to convert a string back into a number. This helps to ensure that numbers are displayed in a format that is familiar to the user, regardless of their location.

What is message translation in PHP localization?

Message translation, also known as internationalization, is the process of preparing your application to support multiple languages. This is done using the gettext or intl extension in PHP. You can use the _() or gettext() function to mark strings for translation, and then use tools like Poedit to create translation files. These files can then be used to display the application in different languages.

How can I use PHP Intl for text segmentation?

Text segmentation is the process of breaking down a text into its constituent parts, such as sentences, words, or individual characters. The PHP Intl extension provides the IntlBreakIterator class for this purpose. You can use the createWordInstance method to create a word iterator, and then use the setText method to set the text to be segmented. The iterator can then be used to iterate over the words in the text.

What is transliteration in PHP Intl?

Transliteration is the process of converting text from one script to another. For example, you might want to convert a text in Cyrillic script to Latin script. The PHP Intl extension provides the Transliterator class for this purpose. You can use the create method to create a transliterator with a specific transliteration rule, and then use the transliterate method to perform the transliteration.

How can I handle plurals in PHP localization?

Handling plurals in PHP localization can be done using the MessageFormatter class in the PHP Intl extension. This class allows you to format messages with complex plural rules. You can use the formatMessage method to format a message with plural rules, passing in the locale, the message pattern, and the arguments.

How can I handle collation in PHP localization?

Collation is the process of sorting and comparing strings. The PHP Intl extension provides the Collator class for this purpose. You can use the create method to create a collator for a specific locale, and then use the compare method to compare two strings according to the collation rules of that locale.

How can I handle locales in PHP localization?

Handling locales in PHP localization can be done using the Locale class in the PHP Intl extension. This class provides methods for getting information about a locale, such as its language, script, region, and variants. You can also use the getDefault method to get the default locale, and the setDefault method to set the default locale.

Younes RafieYounes Rafie
View Author

Younes is a freelance web developer, technical writer and a blogger from Morocco. He's worked with JAVA, J2EE, JavaScript, etc., but his language of choice is PHP. You can learn more about him on his website.

BrunoSdynamic languagei18nintll10nlocalizationmultilanguagePHPphp-intl
Share this article
Read Next
Get the freshest news and resources for developers, designers and digital creators in your inbox each week