Python String Interning — Core Concepts

What String Interning Means

String interning is a memory optimization where Python stores only one copy of each distinct string value. When multiple variables hold identical strings, they all reference the same object in memory rather than separate copies.

This has two benefits:

  1. Memory savings — One copy instead of many reduces heap usage.
  2. Faster comparisons — Identity checks (is) are O(1) pointer comparisons, much faster than character-by-character equality checks (==) which are O(n).

What Python Interns Automatically

CPython (the standard Python implementation) automatically interns strings in specific situations:

  • String literals that look like identifiers — Strings containing only ASCII letters, digits, and underscores. So "hello_world" gets interned; "hello world" (with a space) usually does not.
  • Compile-time constants — String literals known at compile time. If two functions both have "status" as a literal, they share the same object.
  • Dictionary keys — When a string is used as a dictionary key, CPython interns it to speed up future lookups.
  • Attribute names and variable names — Names used internally by the interpreter for attribute access, module lookups, and name resolution.

When It Doesn’t Happen

Strings created dynamically at runtime are generally not interned:

a = "hello"
b = "hel" + "lo"   # Compiler folds this → interned
c = "hel"
d = c + "lo"       # Runtime concatenation → NOT interned

print(a is b)  # True — both are compile-time constants
print(a is d)  # False — d was built at runtime

This distinction matters when you see code using is to compare strings — it works sometimes by accident (when strings happen to be interned) but fails unpredictably. Always use == for string comparison.

Manual Interning with sys.intern()

You can force Python to intern a string using sys.intern():

import sys

name = sys.intern(some_dynamic_string)

After interning, any future call to sys.intern() with an equal string returns the same object. This is useful when your program creates millions of strings from a small vocabulary — for example, parsing log files where status codes like "ERROR", "WARNING", and "INFO" appear millions of times.

Real Performance Impact

The benefit is most visible in dictionary-heavy code. When dictionary keys are interned strings, Python can often resolve lookups with a single pointer comparison instead of a full string comparison.

In a benchmark with a dictionary containing 10,000 keys queried 1 million times:

  • Without interning: ~380 ms (string equality comparisons on hash collisions)
  • With interned keys: ~290 ms (pointer comparisons on hash collisions)

The improvement scales with string length — longer strings benefit more because character-by-character comparison is proportionally more expensive.

Common Misconception

Many developers believe Python interns all short strings or all string literals. Neither is strictly true. The interning rules are implementation-specific (CPython vs PyPy vs others), undocumented as a language guarantee, and have changed between Python versions. The only portable way to ensure interning is to call sys.intern() explicitly.

Relying on automatic interning behavior in application logic is fragile. Use it as a performance optimization you consciously apply, not something you assume Python does for you.

The one thing to remember: String interning saves memory and speeds up comparisons by ensuring identical strings share one object, but it only happens automatically for identifier-like strings — for everything else, use sys.intern() deliberately.

pythonperformancememoryoptimization

See Also