Python textwrap — Deep Dive
Under the hood
TextWrapper is the engine behind all module-level functions. When you call textwrap.fill(text, width=60), it creates a temporary TextWrapper(width=60) and calls its .fill() method. Understanding the class internals lets you customize behavior well beyond the defaults.
The wrapping algorithm works in stages:
- Munge whitespace — tabs expand, whitespace collapses (configurable)
- Split into chunks — using a regex that respects word boundaries and hyphens
- Greedy line-filling — chunks accumulate on a line until adding the next would exceed width
- Handle overflow — long words either break or overflow based on
break_long_words
The split regex is stored in TextWrapper.wordsep_re (and wordsep_simple_re when break_on_hyphens=False). You can override it in a subclass for custom splitting behavior.
Subclassing TextWrapper
The most powerful customization point is overriding _wrap_chunks or _split_chunks. Here’s a subclass that preserves existing line breaks (useful for Markdown-like text where single newlines are intentional):
import textwrap
class ParagraphWrapper(textwrap.TextWrapper):
"""Wraps each paragraph independently, preserving blank-line breaks."""
def wrap(self, text):
paragraphs = text.split("\n\n")
result = []
for para in paragraphs:
# Collapse internal newlines within a paragraph
cleaned = " ".join(para.split())
if cleaned:
result.extend(super().wrap(cleaned))
result.append("") # blank line between paragraphs
if result and result[-1] == "":
result.pop()
return result
Another common subclass: a wrapper that handles ANSI escape codes by stripping them for width calculation but preserving them in output:
import re, textwrap
ANSI_RE = re.compile(r"\x1b\[[0-9;]*m")
class AnsiWrapper(textwrap.TextWrapper):
def _wrap_chunks(self, chunks, *args, **kwargs):
# Temporarily measure without ANSI codes
saved = self.width
# This is a simplified approach; production code
# needs per-chunk length adjustment
return super()._wrap_chunks(chunks, *args, **kwargs)
def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
# Custom handling for words containing ANSI codes
chunk = reversed_chunks[-1]
visible_len = len(ANSI_RE.sub("", chunk))
if visible_len <= width:
cur_line.append(reversed_chunks.pop())
else:
super()._handle_long_word(reversed_chunks, cur_line, cur_len, width)
Unicode width issues
textwrap measures width by len(), which counts Unicode code points. This breaks for:
- CJK characters — each occupies 2 terminal columns
- Combining characters — zero-width diacritics
- Emoji — often 2 columns wide
A production-grade solution uses unicodedata or the third-party wcwidth library:
import unicodedata, textwrap
def display_width(text):
width = 0
for ch in text:
eaw = unicodedata.east_asian_width(ch)
width += 2 if eaw in ("F", "W") else 1
return width
class CJKWrapper(textwrap.TextWrapper):
def _wrap_chunks(self, chunks):
lines = []
cur_line = []
cur_width = 0
indent = self.subsequent_indent
for chunk in chunks:
chunk_width = display_width(chunk)
if cur_width + chunk_width <= self.width:
cur_line.append(chunk)
cur_width += chunk_width
else:
if cur_line:
lines.append("".join(cur_line))
cur_line = [indent, chunk]
cur_width = display_width(indent) + chunk_width
if cur_line:
lines.append("".join(cur_line))
return lines
dedent internals and edge cases
textwrap.dedent finds the longest common leading whitespace across all non-empty lines and removes it. Gotchas:
-
Tabs vs. spaces — dedent treats them as distinct characters. A line indented with a tab won’t share a common prefix with one indented by 4 spaces.
-
Empty lines — completely empty lines (no whitespace at all) are ignored when computing the common prefix. Lines with only whitespace are not.
-
The backslash-newline trick — using
"""\(backslash after triple-quote) avoids an empty first line that would otherwise throw off the indentation:
# Good
msg = textwrap.dedent("""\
line one
line two
""")
# Bad — first line is empty, dedent still works but you get a leading newline
msg = textwrap.dedent("""
line one
line two
""")
Performance considerations
textwrap is not designed for high-throughput text processing. Each wrap() call compiles regexes (unless you reuse a TextWrapper instance). Benchmarks on a typical machine:
- Module-level
fill(): ~50μs for a 500-character paragraph - Reused
TextWrapper.fill(): ~30μs for the same text - For 100,000 paragraphs: ~3 seconds with reuse vs. ~5 seconds without
If you’re formatting millions of strings, reuse the TextWrapper instance. For truly hot paths (like log formatting in a high-throughput service), consider a simpler custom implementation that skips the regex-based splitting.
Production recipes
CLI help formatter
import textwrap
def format_help(options):
"""Format CLI options with aligned descriptions."""
wrapper = textwrap.TextWrapper(
width=78,
initial_indent="",
subsequent_indent=" " * 24,
)
lines = []
for flag, desc in options:
prefix = f" {flag:<20} "
if len(prefix) > 24:
lines.append(prefix)
lines.append(wrapper.fill(desc))
else:
lines.append(wrapper.fill(prefix + desc))
return "\n".join(lines)
Email-style quoting
import textwrap
def quote_reply(text, width=72):
wrapped = textwrap.fill(text, width=width - 2)
return textwrap.indent(wrapped, "> ")
Log message formatting
import textwrap
def format_log_detail(message, max_width=120):
"""Wrap long log details with hanging indent."""
wrapper = textwrap.TextWrapper(
width=max_width,
initial_indent=" DETAIL: ",
subsequent_indent=" ",
max_lines=5,
placeholder=" [...]",
)
return wrapper.fill(message)
Docstring cleanup
import textwrap, inspect
def clean_docstring(func):
"""Extract and clean a function's docstring."""
doc = inspect.getdoc(func) or ""
return textwrap.dedent(doc).strip()
textwrap vs. alternatives
| Need | Tool |
|---|---|
| Basic line wrapping | textwrap |
| Rich terminal formatting | rich library |
| ANSI-aware wrapping | rich or custom subclass |
| CJK-aware wrapping | Custom with wcwidth |
| Markdown rendering | mdformat, rich.markdown |
| PDF/HTML text flow | reportlab, browser engine |
textwrap owns the “plain text, known width” niche. The moment you need color, markup, or complex Unicode, layer something on top.
One thing to remember
textwrap is deceptively simple on the surface but highly customizable through subclassing. Reuse TextWrapper instances for performance, be aware of its character-counting limitations with CJK and ANSI codes, and reach for dedent liberally to keep multi-line strings clean in indented code.
See Also
- Python Atexit How Python's atexit module lets your program clean up after itself right before it shuts down.
- Python Bisect Sorted Lists How Python's bisect module finds things in sorted lists the way you'd find a word in a dictionary — by jumping to the middle.
- Python Contextlib How Python's contextlib module makes the 'with' statement work for anything, not just files.
- Python Copy Module Why copying data in Python isn't as simple as it sounds, and how the copy module prevents sneaky bugs.
- Python Dataclass Field Metadata How Python dataclass fields can carry hidden notes — like sticky notes on a filing cabinet that tools read automatically.