Python pprint — Deep Dive
How the formatting algorithm works
PrettyPrinter uses a recursive approach with a width budget:
- Try to format the entire object on one line using
repr() - If it fits within the remaining width, use it
- If not, expand: format each element on its own line, indented, and recurse
The decision to expand is per-object, not global. A list of short strings might stay on one line while a list of dicts expands — even within the same parent structure.
The key internal methods are:
_format(object, stream, indent, allowance, context, level)— the recursive core_repr(object, context, level)— generates the one-line repr_pprint_dict,_pprint_list, etc. — type-specific formatters registered via_dispatch
Custom object formatting
By default, pprint calls repr() on objects it doesn’t have a specific formatter for. You can improve this two ways:
Option 1: Define __repr__ on your class
class Config:
def __init__(self, host, port, debug):
self.host = host
self.port = port
self.debug = debug
def __repr__(self):
return (
f"Config(host={self.host!r}, "
f"port={self.port!r}, "
f"debug={self.debug!r})"
)
pprint will use this repr when the object appears inside containers.
Option 2: Register a custom formatter
For deeper control, subclass PrettyPrinter and register a dispatch handler:
from pprint import PrettyPrinter
class CustomPP(PrettyPrinter):
_dispatch = PrettyPrinter._dispatch.copy()
def _pprint_config(self, object, stream, indent, allowance, context, level):
cls_name = object.__class__.__name__
stream.write(f"{cls_name}(\n")
next_indent = indent + self._indent_per_level
for i, (key, val) in enumerate(vars(object).items()):
stream.write(" " * next_indent + f"{key}=")
self._format(val, stream, next_indent + len(key) + 1,
allowance if i == len(vars(object)) - 1 else 1,
context, level + 1)
if i < len(vars(object)) - 1:
stream.write(",\n")
stream.write("\n" + " " * indent + ")")
_dispatch[Config.__repr__] = _pprint_config
Note: the dispatch table keys off the __repr__ method identity, which is fragile. For production use, consider wrapping objects before passing to pprint.
Integration with logging
pprint pairs well with Python’s logging module for structured debug output:
import logging
from pprint import pformat
logger = logging.getLogger(__name__)
def process_response(response):
logger.debug("API response:\n%s", pformat(response, width=100, depth=4))
Using pformat (not pprint) avoids writing to stdout and lets the logging framework handle the output. The %s lazy formatting means the pformat call only executes if DEBUG level is active.
For high-volume logging, guard the formatting:
if logger.isEnabledFor(logging.DEBUG):
logger.debug("State:\n%s", pformat(large_state))
This avoids the pformat cost entirely when debug logging is disabled.
Circular reference handling
pprint detects circular references using an identity-based context set:
from pprint import pprint
a = [1, 2]
a.append(a) # circular!
pprint(a)
# [1, 2, <Recursion on list with id=...>]
This is one of pprint’s advantages over json.dumps, which raises ValueError on circular structures.
Performance characteristics
pprint is not optimized for speed — it’s a debugging tool. Rough benchmarks:
| Data size | pformat time |
|---|---|
| 100-element flat list | ~50μs |
| 1,000-element flat dict | ~500μs |
| 10,000-element nested | ~15ms |
| 100,000-element nested | ~200ms |
For large data, use depth to limit recursion and width to control expansion. If you’re formatting megabytes of data, you’re probably using the wrong tool — consider streaming JSON or custom formatters.
pprint in the REPL and IPython
The standard Python REPL uses repr() by default. You can make pprint the default display:
import sys
from pprint import pprint
# Override displayhook for REPL
def pprint_displayhook(value):
if value is not None:
pprint(value)
__builtins__.__dict__["_"] = value
sys.displayhook = pprint_displayhook
IPython has this built in with %pprint magic command, which toggles pretty printing on and off.
Production recipes
Diff-friendly configuration dumps
from pprint import pformat
def dump_config(config, path="config_dump.txt"):
"""Write config in a format that diffs well."""
formatted = pformat(config, width=60, sort_dicts=True)
with open(path, "w") as f:
f.write(formatted + "\n")
Sorting keys and using a consistent width means git diffs show only actual changes, not reformatting noise.
Truncated debug output
from pprint import pformat
def debug_preview(obj, max_chars=500):
"""Format object for debug, truncating if too large."""
full = pformat(obj, width=80, depth=3)
if len(full) > max_chars:
return full[:max_chars] + "\n... [truncated]"
return full
Test assertion messages
from pprint import pformat
def assert_dict_equal(actual, expected):
if actual != expected:
msg = (
f"Dicts differ:\n"
f"ACTUAL:\n{pformat(actual, width=60)}\n\n"
f"EXPECTED:\n{pformat(expected, width=60)}"
)
raise AssertionError(msg)
pprint vs. modern alternatives
| Tool | Best for |
|---|---|
pprint | Quick debugging, stdlib-only environments |
rich.pretty | Color-highlighted, theme-aware terminal output |
icecream | Inline debug printing with variable names |
devtools.debug | pydantic-aware, colored output |
json.dumps | JSON-valid output for APIs |
rich.pretty.install() replaces pprint as the REPL default with syntax-highlighted, type-aware output. For modern development, it’s a strict upgrade — but pprint remains valuable for environments where third-party packages aren’t available.
One thing to remember
pprint is the universal debugging formatter that handles every Python type including circular references. Use pformat for string output, guard expensive calls behind log-level checks, and know that rich.pretty is the modern successor when you can afford a dependency.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.