Python gettext i18n — Deep Dive

Python’s gettext module mirrors GNU gettext closely, which means decades of tooling, documentation, and translator workflows transfer directly. But production use demands understanding plural formulas, context strings, thread-safe switching, and framework integration.

How the Catalog Lookup Works

When you call _("Hello"), Python:

  1. Loads the .mo file for the active language (binary hash table)
  2. Hashes the msgid string
  3. Returns the matching msgstr, or the original msgid if not found

The .mo format uses a sorted hash table for O(1) lookups. For a catalog with 10,000 entries, lookup time is essentially the same as for 100 entries.

Plural Forms: Beyond English

English has two plural forms: singular and plural. Many languages have more. Polish has three (one, few, many). Arabic has six.

The .po header declares the formula:

"Plural-Forms: nplurals=3; plural=(n==1 ? 0 : n%10>=2 && n%10<=4 && (n%100<12 || n%100>14) ? 1 : 2);\n"

In Python:

ngettext = lang.ngettext

# For n=1: "Usunięto 1 plik"
# For n=3: "Usunięto 3 pliki"
# For n=7: "Usunięto 7 plików"
msg = ngettext("Deleted %(count)d file",
               "Deleted %(count)d files", n) % {"count": n}

The .po file for Polish has three msgstr entries per translatable plural:

msgid "Deleted %(count)d file"
msgid_plural "Deleted %(count)d files"
msgstr[0] "Usunięto %(count)d plik"
msgstr[1] "Usunięto %(count)d pliki"
msgstr[2] "Usunięto %(count)d plików"

Context Disambiguation with pgettext

The word “Post” could mean a blog post or the verb “to post.” Context strings disambiguate:

from gettext import pgettext

label = pgettext("blog", "Post")       # noun: blog post
action = pgettext("action", "Post")    # verb: submit/post

In the .po file:

msgctxt "blog"
msgid "Post"
msgstr "Article"

msgctxt "action"
msgid "Post"
msgstr "Publier"

pgettext was added in Python 3.8. For older versions, the common workaround is encoding context in the msgid itself.

Lazy Translation

In Django and other frameworks, you often define translatable strings at module level — before the user’s language is known:

from django.utils.translation import gettext_lazy as _

class LoginForm(forms.Form):
    username = forms.CharField(label=_("Username"))

Implementing lazy translation yourself:

class LazyString:
    def __init__(self, func, text):
        self._func = func
        self._text = text

    def __str__(self):
        return self._func(self._text)

    def __repr__(self):
        return f"LazyString({self._text!r})"

def lazy_gettext(text):
    return LazyString(_, text)

The translation resolves only when __str__ is called, by which time the active language is set.

Thread Safety and Per-Request Languages

In a web app, different users may have different languages simultaneously. Global install() is not thread-safe for this.

Pattern: Per-request translation object

import threading
from gettext import translation

_active = threading.local()

def activate(language):
    t = translation("myapp", localedir="locales",
                    languages=[language], fallback=True)
    _active.trans = t

def gettext(message):
    t = getattr(_active, "trans", None)
    if t is None:
        return message
    return t.gettext(message)

_ = gettext

In ASGI frameworks (FastAPI, Starlette), use contextvars instead of threading.local:

import contextvars
from gettext import translation

_current_trans = contextvars.ContextVar("current_trans", default=None)

def activate(language):
    t = translation("myapp", localedir="locales",
                    languages=[language], fallback=True)
    _current_trans.set(t)

def _(message):
    t = _current_trans.get()
    return t.gettext(message) if t else message

Extraction with Babel

While xgettext works, Babel’s pybabel handles Python-specific patterns better:

pybabel extract -F babel.cfg -o messages.pot .
pybabel init -l fr -d locales -i messages.pot
pybabel compile -d locales

Babel also handles:

  • Jinja2 templates
  • Date/number formatting
  • Territory-specific data (currencies, measurement units)

Integration Patterns

Flask

from flask_babel import Babel, _

app = Flask(__name__)
babel = Babel(app)

@babel.localeselector
def get_locale():
    return request.accept_languages.best_match(["en", "fr", "de"])

Django

Django wraps gettext internally. You use gettext_lazy for model fields and gettext in views:

from django.utils.translation import gettext as _

def dashboard(request):
    return render(request, "dash.html", {"title": _("Dashboard")})

CLI Applications (Click)

import click
import gettext

t = gettext.translation("mycli", fallback=True)
_ = t.gettext

@click.command()
def main():
    click.echo(_("Processing complete"))

Common Production Pitfalls

1. String concatenation breaks translation

# BAD: Translator sees two fragments with no context
msg = _("Hello") + " " + _("World")

# GOOD: One complete sentence
msg = _("Hello World")

2. Variable substitution inside _()

# BAD: Every unique name creates a new msgid
msg = _(f"Hello {name}")

# GOOD: Placeholder stays constant
msg = _("Hello {name}").format(name=name)

3. Forgetting to recompile .mo files

Changed .po files do nothing until you run msgfmt or pybabel compile. CI pipelines should include this step.

4. Missing LC_MESSAGES directory level

The path must be {localedir}/{lang}/LC_MESSAGES/{domain}.mo. Skipping LC_MESSAGES silently falls back to the source strings.

Testing Translations

def test_french_greeting():
    fr = gettext.translation("myapp", localedir="locales",
                             languages=["fr"])
    _ = fr.gettext
    assert _("Welcome") == "Bienvenue"

def test_fallback_returns_original():
    fake = gettext.translation("myapp", localedir="locales",
                               languages=["xx"], fallback=True)
    _ = fake.gettext
    assert _("Welcome") == "Welcome"

For coverage, keep a test that loads every .mo file and verifies at least one string per file resolves differently from the source — this catches stale or empty catalogs.

Performance Considerations

  • .mo loading is I/O-bound; cache the GNUTranslations object per language
  • For high-traffic web apps, pre-load all supported languages at startup
  • gettext() calls themselves are pure dictionary lookups — negligible overhead
  • Avoid re-parsing .mo files on every request

The one thing to remember: Production gettext means handling plurals correctly, keeping translations thread-safe with contextvars, and never concatenating translatable fragments — one complete sentence per _() call.

pythoni18ngettextinternationalizationlocalizationplurals

See Also

  • Python Babel Localization Babel teaches your Python app how dates, numbers, and currencies look in every country — not just yours.
  • Python Locale Module Python's locale module reads your computer's regional settings so numbers, dates, and sorting feel right for where you live.
  • Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
  • Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
  • Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.