Python String Interning Internals — Core Concepts
String interning is an optimization where Python stores only one copy of each unique string value. When two variables hold the same interned string, they point to the exact same object in memory. This saves memory and enables O(1) identity comparison instead of O(n) character-by-character equality checks.
For general string handling, see Python String Manipulation.
How Interning Works
When a string is interned, Python adds it to an internal dictionary. Before creating a new string object, it checks this dictionary. If the string already exists, Python returns a reference to the existing object instead of allocating new memory.
This means two interned strings with the same content will have the same id() — they are literally the same object.
What CPython Interns Automatically
CPython (the standard Python implementation) automatically interns several types of strings:
- String literals that look like valid identifiers (letters, digits, underscores only)
- Variable names, function names, attribute names — used in the bytecode
- Dictionary keys — because key lookup is so frequent
- Module names and standard constants
Strings that contain spaces, punctuation, or are computed at runtime are typically not interned automatically.
Explicit Interning with sys.intern()
You can force any string to be interned using sys.intern():
When you have a large dataset with many repeated string values — say, millions of log entries where the “level” field is always “INFO,” “WARN,” or “ERROR” — calling sys.intern() on each value ensures only three string objects exist in memory instead of millions of duplicates.
Identity vs Equality
This is the most common source of confusion:
==checks if two strings have the same value (always reliable)ischecks if two strings are the same object (only reliable for interned strings)
Because interning behavior is an implementation detail that can change between Python versions, you should never use is to compare string values. Always use ==.
When String Interning Helps
| Scenario | Benefit |
|---|---|
| Many repeated short strings (keys, labels) | Memory savings |
| High-frequency dictionary lookups | Faster key comparison |
| Large datasets with categorical columns | Significant memory reduction |
| Symbol tables in parsers/compilers | Both speed and memory |
When It Doesn’t Help
- Unique strings (user input, UUIDs) — nothing to share
- Very long strings — the overhead of maintaining the intern table may not pay off
- Short-lived strings — interned strings live until the interpreter shuts down
Common Misconception
“All string literals are interned.” Only string literals that look like identifiers are interned automatically. The literal "hello world" (with a space) may or may not be interned depending on the Python version and context. CPython’s interning rules are implementation details, not language guarantees.
One Thing to Remember
String interning lets Python share one object for identical string values — use sys.intern() when you have millions of repeated strings, but always use == for comparison because interning behavior is an implementation detail.
See Also
- Python Rope Data Structure Learn how the rope data structure handles huge texts efficiently — like organizing a book with sticky notes instead of rewriting every page.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.