Internationalisation

I implemented gettext in my app to handle internationalisation/ translations/ localisations.

However, I’m finding gettext to be somewhat problematic when applied to a distributable system.

A little background info
We use gettext at work on a site that’s available in about 10 languages. It works fine:

  • translators like to work with poEdit
  • we have full control over the hosting environment, so there’s no trouble with missing libraries or locales
  • gettext is incredibly fast, as the strings are read from memory and are cached across requests

I decided to implement gettext into my private project too, but now that other people are starting to use it, there are a couple of problems I’m facing:

  • locales are often not installed on the servers
  • gettext is often not installed on the server
  • no shell access, or exec() is disabled meaning it’s difficult/ impossible to use xgettext et al
  • how to distribute .po and .mo files for plugins and widgets

I know I could use a simple lookup array, but I don’t want to go through 400+ files changing all the texts to array keys. Consider the following:


Error::raise(trans('The requested page does not exist.'));

The trans() function is a simple wrapper for gettext. Now, I could do something like this (for a Dutch translation):


$strings = array();
$strings['The requested page does not exist.'] = 'De opgevraagde pagina bestaat niet';

echo trans('The requested page does not exist.');
// output: De opgevraagde pagina bestaat niet

Is this bad form? It works, but I don’t really know if having an array with such keys is a good idea. Some strings consist of several lines of text.

I could do this:


$strings = array();
$strings['page_not_exist'] = 'De opgevraagde pagina bestaat niet';

echo trans('page_not_exist');

Something like that would work too, but… I don’t want to have to go through 400+ changing trans() calls. The system contains about 1700 strings so far, so it’s not even really a viable option.

I’ve implemented the php-gettext (https://launchpad.net/php-gettext/) library as fall-back for those users who may not have gettext installed or are missing the required locales on their servers, but I have no idea how it performs for 1700+ strings.

I think it’s acceptable to require users who want to develop plugins/ widgets and to translate them to have exec() enabled and to have gnu gettext installed on their machine (I think they’ll mostly be doing that on development machines anyway).

If a developer has created a plugin, should the related mo/ po files be distributed with the plugin? And moved into the correct folder for gettext when the plugin is installed? Who should be responsible if someone else want to provide a translation for that plugin?

So… my question
I’m interested in how other people handle translations/ gettext and distributing related files.

This one seems to work fine:

"|trans\\('(.*)',\\s*[\\"'](.*)[\\"']\\)|uU"

(Does not take into account nested parentheses, though.)

Something like that might work, I think I need to do some benchmarking first to see how much slower the app is with php-gettext.

I think that the first step of your workflow is probably the hardest :wink:
That would be an awesome regex!

locales are often not installed on the servers
Yes, this is probably main problem with gettext as extension. But php-gettext is a bit slow:
http://mel.melaxis.com/devblog/2006/04/10/benchmarking-php-localization-is-gettext-fast-enough/

Strings array are IMHO fine, for bigger count you can divide array to modules:

trans($module, $string);

Workflow could look like:

  1. Extract all translatable strings from source files (which were modified after last extraction …).
  2. Write the strings into (CSV?) file and check content if items are correct.
  3. Write CSV file into database table with hash as unique key - already translated and unchanged strings are skipped.
  4. Copy all untranslated strings from table to file for translation (INI - OmegaT, CSV - MemoQ etc.).
  5. Translated files goes into specific language table with hash as foreign index.
  6. Final step includes PHP array generation and serialization.