Python gettext i18n — Deep Dive
Python’s gettext module mirrors GNU gettext closely, which means decades of tooling, documentation, and translator workflows transfer directly. But production use demands understanding plural formulas, context strings, thread-safe switching, and framework integration.
How the Catalog Lookup Works
When you call _("Hello"), Python:
- Loads the
.mofile for the active language (binary hash table) - Hashes the
msgidstring - Returns the matching
msgstr, or the originalmsgidif not found
The .mo format uses a sorted hash table for O(1) lookups. For a catalog with 10,000 entries, lookup time is essentially the same as for 100 entries.
Plural Forms: Beyond English
English has two plural forms: singular and plural. Many languages have more. Polish has three (one, few, many). Arabic has six.
The .po header declares the formula:
"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2);\n"
In Python:
ngettext = lang.ngettext
# For n=1: "Usunięto 1 plik"
# For n=3: "Usunięto 3 pliki"
# For n=7: "Usunięto 7 plików"
msg = ngettext("Deleted %(count)d file",
"Deleted %(count)d files", n) % {"count": n}
The .po file for Polish has three msgstr entries per translatable plural:
msgid "Deleted %(count)d file"
msgid_plural "Deleted %(count)d files"
msgstr[0] "Usunięto %(count)d plik"
msgstr[1] "Usunięto %(count)d pliki"
msgstr[2] "Usunięto %(count)d plików"
Context Disambiguation with pgettext
The word “Post” could mean a blog post or the verb “to post.” Context strings disambiguate:
from gettext import pgettext
label = pgettext("blog", "Post") # noun: blog post
action = pgettext("action", "Post") # verb: submit/post
In the .po file:
msgctxt "blog"
msgid "Post"
msgstr "Article"
msgctxt "action"
msgid "Post"
msgstr "Publier"
pgettext was added in Python 3.8. For older versions, the common workaround is encoding context in the msgid itself.
Lazy Translation
In Django and other frameworks, you often define translatable strings at module level — before the user’s language is known:
from django.utils.translation import gettext_lazy as _
class LoginForm(forms.Form):
username = forms.CharField(label=_("Username"))
Implementing lazy translation yourself:
class LazyString:
def __init__(self, func, text):
self._func = func
self._text = text
def __str__(self):
return self._func(self._text)
def __repr__(self):
return f"LazyString({self._text!r})"
def lazy_gettext(text):
return LazyString(_, text)
The translation resolves only when __str__ is called, by which time the active language is set.
Thread Safety and Per-Request Languages
In a web app, different users may have different languages simultaneously. Global install() is not thread-safe for this.
Pattern: Per-request translation object
import threading
from gettext import translation
_active = threading.local()
def activate(language):
t = translation("myapp", localedir="locales",
languages=[language], fallback=True)
_active.trans = t
def gettext(message):
t = getattr(_active, "trans", None)
if t is None:
return message
return t.gettext(message)
_ = gettext
In ASGI frameworks (FastAPI, Starlette), use contextvars instead of threading.local:
import contextvars
from gettext import translation
_current_trans = contextvars.ContextVar("current_trans", default=None)
def activate(language):
t = translation("myapp", localedir="locales",
languages=[language], fallback=True)
_current_trans.set(t)
def _(message):
t = _current_trans.get()
return t.gettext(message) if t else message
Extraction with Babel
While xgettext works, Babel’s pybabel handles Python-specific patterns better:
pybabel extract -F babel.cfg -o messages.pot .
pybabel init -l fr -d locales -i messages.pot
pybabel compile -d locales
Babel also handles:
- Jinja2 templates
- Date/number formatting
- Territory-specific data (currencies, measurement units)
Integration Patterns
Flask
from flask_babel import Babel, _
app = Flask(__name__)
babel = Babel(app)
@babel.localeselector
def get_locale():
return request.accept_languages.best_match(["en", "fr", "de"])
Django
Django wraps gettext internally. You use gettext_lazy for model fields and gettext in views:
from django.utils.translation import gettext as _
def dashboard(request):
return render(request, "dash.html", {"title": _("Dashboard")})
CLI Applications (Click)
import click
import gettext
t = gettext.translation("mycli", fallback=True)
_ = t.gettext
@click.command()
def main():
click.echo(_("Processing complete"))
Common Production Pitfalls
1. String concatenation breaks translation
# BAD: Translator sees two fragments with no context
msg = _("Hello") + " " + _("World")
# GOOD: One complete sentence
msg = _("Hello World")
2. Variable substitution inside _()
# BAD: Every unique name creates a new msgid
msg = _(f"Hello {name}")
# GOOD: Placeholder stays constant
msg = _("Hello {name}").format(name=name)
3. Forgetting to recompile .mo files
Changed .po files do nothing until you run msgfmt or pybabel compile. CI pipelines should include this step.
4. Missing LC_MESSAGES directory level
The path must be {localedir}/{lang}/LC_MESSAGES/{domain}.mo. Skipping LC_MESSAGES silently falls back to the source strings.
Testing Translations
def test_french_greeting():
fr = gettext.translation("myapp", localedir="locales",
languages=["fr"])
_ = fr.gettext
assert _("Welcome") == "Bienvenue"
def test_fallback_returns_original():
fake = gettext.translation("myapp", localedir="locales",
languages=["xx"], fallback=True)
_ = fake.gettext
assert _("Welcome") == "Welcome"
For coverage, keep a test that loads every .mo file and verifies at least one string per file resolves differently from the source — this catches stale or empty catalogs.
Performance Considerations
.moloading is I/O-bound; cache theGNUTranslationsobject per language- For high-traffic web apps, pre-load all supported languages at startup
gettext()calls themselves are pure dictionary lookups — negligible overhead- Avoid re-parsing
.mofiles on every request
The one thing to remember: Production gettext means handling plurals correctly, keeping translations thread-safe with contextvars, and never concatenating translatable fragments — one complete sentence per _() call.
See Also
- Python Babel Localization Babel teaches your Python app how dates, numbers, and currencies look in every country — not just yours.
- Python Locale Module Python's locale module reads your computer's regional settings so numbers, dates, and sorting feel right for where you live.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.