Speeding up gettext with memcache on Google App Engine, PHP
Google App Engine released experimental support for PHP lately. But it is still missing a critical feature central to building multi-language websites: internationalization support. PHP has an internationalization extension called gettext that allows developers to conveniently translate content and achieves content-layout separation. Below is an example:
#original text msgid "gettext example" #text after translation msgstr "gettext 示例"
//tells gettext to look for simplified Chinese translation putenv('LC_ALL=zh_CN'); setlocale(LC_ALL, "zh_CN"); //tells gettext to look for files named "messages.po" $domain = "messages"; bindtextdomain($domain, "/"); //use the domain specified before textdomain($domain); echo _("gettext example"); //output should be "gettext 示例" instead
Translation is achieved by wrapping the text to be translated in a function called gettext(), commonly aliased as _(). This method is much more scalable than using Strings IDs, and is the preferred method of translation for large sites like Wordpress.
Unfortunately, the gettext native extension is not included in the GAE PHP runtime. In order to enable dynamic translation, one would have to resort to PHP implementations of the gettext library. This article showed that the PHP implementation is around 1.5x-2x slower than the extension.
One of the reason is that PHP is rather stateless between requests: variables do not persist unless some serialization mechanism is used. Every time a request is made, the PHP implementation has to re-parse the .mo files required in translation. The gettext extension, on the other hand, not only caches the files between requests, but also runs in a native manner (the library is compiled into machine code, making it run much faster).
I wanted to deploy a multi-lingual application on GAE, but wanted to get as close to native performance as possible in translation, in absence of the native library. To achieve that, I took advantage of Google’s memcache service, and used that to store the data parsed from .mo files between requests, hence reducing the time needed during translation.
I did a speed test using ApacheBench over 5000 requests, with concurrency value at 5. The language file used contains 10000 machine-generated strings. Below are the results on my development environment: nginx 1.4.4 + php-fpm.
The memcache-enabled version is still slower than the native library, but there is already a significant improvement of nearly 100% over the original PHP implementation.
The library can be found at Github here.