Python textwrap — Deep Dive

Advanced textwrap techniques: custom wrapper subclasses, Unicode handling, performance patterns, and production formatting recipes.

Under the hood

TextWrapper is the engine behind all module-level functions. When you call textwrap.fill(text, width=60), it creates a temporary TextWrapper(width=60) and calls its .fill() method. Understanding the class internals lets you customize behavior well beyond the defaults.

The wrapping algorithm works in stages:

Munge whitespace — tabs expand, whitespace collapses (configurable)
Split into chunks — using a regex that respects word boundaries and hyphens
Greedy line-filling — chunks accumulate on a line until adding the next would exceed width
Handle overflow — long words either break or overflow based on break_long_words

The split regex is stored in TextWrapper.wordsep_re (and wordsep_simple_re when break_on_hyphens=False). You can override it in a subclass for custom splitting behavior.

Subclassing TextWrapper

The most powerful customization point is overriding _wrap_chunks or _split_chunks. Here’s a subclass that preserves existing line breaks (useful for Markdown-like text where single newlines are intentional):

import textwrap

class ParagraphWrapper(textwrap.TextWrapper):
    """Wraps each paragraph independently, preserving blank-line breaks."""

    def wrap(self, text):
        paragraphs = text.split("\n\n")
        result = []
        for para in paragraphs:
            # Collapse internal newlines within a paragraph
            cleaned = " ".join(para.split())
            if cleaned:
                result.extend(super().wrap(cleaned))
            result.append("")  # blank line between paragraphs
        if result and result[-1] == "":
            result.pop()
        return result

Another common subclass: a wrapper that handles ANSI escape codes by stripping them for width calculation but preserving them in output:

import re, textwrap

ANSI_RE = re.compile(r"\x1b\[[0-9;]*m")

class AnsiWrapper(textwrap.TextWrapper):
    def _wrap_chunks(self, chunks, *args, **kwargs):
        # Temporarily measure without ANSI codes
        saved = self.width
        # This is a simplified approach; production code
        # needs per-chunk length adjustment
        return super()._wrap_chunks(chunks, *args, **kwargs)

    def _handle_long_word(self, reversed_chunks, cur_line, cur_len, width):
        # Custom handling for words containing ANSI codes
        chunk = reversed_chunks[-1]
        visible_len = len(ANSI_RE.sub("", chunk))
        if visible_len <= width:
            cur_line.append(reversed_chunks.pop())
        else:
            super()._handle_long_word(reversed_chunks, cur_line, cur_len, width)

Unicode width issues

textwrap measures width by len(), which counts Unicode code points. This breaks for:

CJK characters — each occupies 2 terminal columns
Combining characters — zero-width diacritics
Emoji — often 2 columns wide

A production-grade solution uses unicodedata or the third-party wcwidth library:

import unicodedata, textwrap

def display_width(text):
    width = 0
    for ch in text:
        eaw = unicodedata.east_asian_width(ch)
        width += 2 if eaw in ("F", "W") else 1
    return width

class CJKWrapper(textwrap.TextWrapper):
    def _wrap_chunks(self, chunks):
        lines = []
        cur_line = []
        cur_width = 0
        indent = self.subsequent_indent

        for chunk in chunks:
            chunk_width = display_width(chunk)
            if cur_width + chunk_width <= self.width:
                cur_line.append(chunk)
                cur_width += chunk_width
            else:
                if cur_line:
                    lines.append("".join(cur_line))
                cur_line = [indent, chunk]
                cur_width = display_width(indent) + chunk_width

        if cur_line:
            lines.append("".join(cur_line))
        return lines

dedent internals and edge cases

textwrap.dedent finds the longest common leading whitespace across all non-empty lines and removes it. Gotchas:

Tabs vs. spaces — dedent treats them as distinct characters. A line indented with a tab won’t share a common prefix with one indented by 4 spaces.
Empty lines — completely empty lines (no whitespace at all) are ignored when computing the common prefix. Lines with only whitespace are not.
The backslash-newline trick — using """\ (backslash after triple-quote) avoids an empty first line that would otherwise throw off the indentation:

# Good
msg = textwrap.dedent("""\
    line one
    line two
""")

# Bad — first line is empty, dedent still works but you get a leading newline
msg = textwrap.dedent("""
    line one
    line two
""")

Performance considerations

textwrap is not designed for high-throughput text processing. Each wrap() call compiles regexes (unless you reuse a TextWrapper instance). Benchmarks on a typical machine:

Module-level fill(): ~50μs for a 500-character paragraph
Reused TextWrapper.fill(): ~30μs for the same text
For 100,000 paragraphs: ~3 seconds with reuse vs. ~5 seconds without

If you’re formatting millions of strings, reuse the TextWrapper instance. For truly hot paths (like log formatting in a high-throughput service), consider a simpler custom implementation that skips the regex-based splitting.

Production recipes

CLI help formatter

import textwrap

def format_help(options):
    """Format CLI options with aligned descriptions."""
    wrapper = textwrap.TextWrapper(
        width=78,
        initial_indent="",
        subsequent_indent=" " * 24,
    )
    lines = []
    for flag, desc in options:
        prefix = f"  {flag:<20}  "
        if len(prefix) > 24:
            lines.append(prefix)
            lines.append(wrapper.fill(desc))
        else:
            lines.append(wrapper.fill(prefix + desc))
    return "\n".join(lines)

Email-style quoting

import textwrap

def quote_reply(text, width=72):
    wrapped = textwrap.fill(text, width=width - 2)
    return textwrap.indent(wrapped, "> ")

Log message formatting

import textwrap

def format_log_detail(message, max_width=120):
    """Wrap long log details with hanging indent."""
    wrapper = textwrap.TextWrapper(
        width=max_width,
        initial_indent="  DETAIL: ",
        subsequent_indent="          ",
        max_lines=5,
        placeholder=" [...]",
    )
    return wrapper.fill(message)

Docstring cleanup

import textwrap, inspect

def clean_docstring(func):
    """Extract and clean a function's docstring."""
    doc = inspect.getdoc(func) or ""
    return textwrap.dedent(doc).strip()

textwrap vs. alternatives

Need	Tool
Basic line wrapping	`textwrap`
Rich terminal formatting	`rich` library
ANSI-aware wrapping	`rich` or custom subclass
CJK-aware wrapping	Custom with `wcwidth`
Markdown rendering	`mdformat`, `rich.markdown`
PDF/HTML text flow	`reportlab`, browser engine

textwrap owns the “plain text, known width” niche. The moment you need color, markup, or complex Unicode, layer something on top.

One thing to remember

textwrap is deceptively simple on the surface but highly customizable through subclassing. Reuse TextWrapper instances for performance, be aware of its character-counting limitations with CJK and ANSI codes, and reach for dedent liberally to keep multi-line strings clean in indented code.

pythonstandard-librarytext-processing