Python dis Module and Bytecode — Deep Dive

Master CPython bytecode analysis with the dis module — instruction formats, specializing interpreter internals, and practical optimization through disassembly.

Bytecode instruction format

In CPython 3.6+, every instruction is exactly 2 bytes (a “word”): one byte for the opcode and one byte for the argument. Instructions that need larger arguments use EXTENDED_ARG prefixes to build up the argument value across multiple words. Python 3.12 changed to a consistent 2-byte instruction word format but keeps the same logical model.

import dis
import sys

def example():
    x = 100_000  # Constant index might need EXTENDED_ARG

# Access raw bytecode
code = example.__code__
print(f"Bytecode bytes: {list(code.co_code)}")
print(f"Python version: {sys.version_info[:2]}")
print()
dis.dis(example)

The dis.Bytecode class

For programmatic analysis, dis.Bytecode provides an iterator of dis.Instruction named tuples:

import dis

def calculate(a, b, op):
    if op == "+":
        return a + b
    elif op == "*":
        return a * b
    return None

bc = dis.Bytecode(calculate)
for instr in bc:
    print(f"offset={instr.offset:3d}  "
          f"opname={instr.opname:25s}  "
          f"arg={instr.arg!s:5s}  "
          f"argval={instr.argval!r}")

Each Instruction has: opcode, opname, arg, argval, argrepr, offset, starts_line, is_jump_target. The argval field resolves the raw argument into its semantic value — a variable name, a constant, or a jump target offset.

Jump analysis and control flow graphs

You can build a basic control flow graph from bytecode:

import dis
from collections import defaultdict

def build_cfg(func):
    """Build a simple control flow graph from bytecode."""
    instructions = list(dis.Bytecode(func))
    blocks = defaultdict(list)
    current_block = 0

    # Find block leaders (targets of jumps)
    leaders = {0}
    for instr in instructions:
        if instr.opcode in dis.hasjabs or instr.opcode in dis.hasjrel:
            leaders.add(instr.argval)
            # Instruction after a jump is also a leader
            next_offset = instr.offset + 2
            leaders.add(next_offset)

    # Group instructions into basic blocks
    sorted_leaders = sorted(leaders)
    leader_to_block = {offset: i for i, offset in enumerate(sorted_leaders)}

    for instr in instructions:
        # Find which block this instruction belongs to
        block_id = max(
            l for l in sorted_leaders if l <= instr.offset
        )
        blocks[leader_to_block[block_id]].append(instr)

    return dict(blocks)

def show_cfg(func):
    cfg = build_cfg(func)
    for block_id, instrs in sorted(cfg.items()):
        print(f"\n--- Block {block_id} ---")
        for instr in instrs:
            print(f"  {instr.offset:3d}: {instr.opname} {instr.argrepr}")

The specializing adaptive interpreter (Python 3.11+)

Python 3.11 introduced a specializing adaptive interpreter that rewrites bytecode at runtime. Generic instructions are replaced with type-specific versions after a few executions:

LOAD_ATTR → LOAD_ATTR_INSTANCE_VALUE (for regular attribute access)
BINARY_OP → BINARY_OP_ADD_INT (for integer addition)
CALL → CALL_PY_EXACT_ARGS (for Python functions with matching signatures)

You can see these specialized instructions with dis.dis() using the adaptive=True parameter (Python 3.12+):

import dis

def tight_loop():
    total = 0
    for i in range(1000):
        total += i
    return total

# Run the function to trigger specialization
tight_loop()

# Show specialized bytecode
dis.dis(tight_loop, adaptive=True)

The specialization happens in-place: the bytecode array is modified so the next execution uses the fast path directly. If a specialization fails (a type guard is violated), the instruction reverts to the generic version.

Bytecode optimization patterns

Pattern 1: Constant folding

CPython’s peephole optimizer folds constant expressions at compile time:

import dis

def constants():
    x = 2 * 3 * 7  # Folded to 42 at compile time
    y = "hello" + " " + "world"  # Folded to "hello world"
    z = (1, 2, 3) + (4, 5)  # Folded to (1, 2, 3, 4, 5)

dis.dis(constants)
# You'll see LOAD_CONST 42, not LOAD_CONST 2; LOAD_CONST 3; BINARY_MULTIPLY

Pattern 2: Comparing variable access speeds

import dis

x_global = 42

def access_global():
    return x_global  # LOAD_GLOBAL

def access_local():
    x_local = 42
    return x_local  # LOAD_FAST

def access_closure():
    x = 42
    def inner():
        return x  # LOAD_DEREF (closure variable)
    return inner

The bytecode shows three different instructions for three access patterns. LOAD_FAST (locals) uses a C array index. LOAD_GLOBAL does a dictionary lookup (optimized with per-keys caching in 3.11+). LOAD_DEREF accesses a cell object through a pointer.

Pattern 3: Understanding comprehension overhead

import dis

# The comprehension creates a hidden function
def with_comp():
    return [x for x in range(10)]

dis.dis(with_comp)
# Shows MAKE_FUNCTION + CALL for the comprehension's inner function
# The inner function itself uses LIST_APPEND which is faster than
# the LOAD_ATTR + CALL overhead of list.append()

Analyzing exception handling bytecode

Python 3.11 changed exception handling from a block-based model to a table-based model. Exception tables replace the old SETUP_EXCEPT / POP_BLOCK instructions:

import dis

def with_exception():
    try:
        risky_operation()
    except ValueError as e:
        handle(e)
    finally:
        cleanup()

dis.dis(with_exception)
# In 3.11+, shows PUSH_EXC_INFO and exception table entries
# Use dis.show_code() for the exception table:
dis.show_code(with_exception)

Instruction frequency analysis

For performance-sensitive code, count which instructions dominate:

import dis
from collections import Counter

def instruction_profile(func):
    counter = Counter()
    for instr in dis.Bytecode(func):
        if instr.opname != "CACHE":  # Skip cache entries (3.11+)
            counter[instr.opname] += 1

    print(f"\nInstruction profile for {func.__name__}:")
    for opname, count in counter.most_common(10):
        bar = "█" * count
        print(f"  {opname:30s} {count:3d}  {bar}")
    return counter

Cross-version bytecode comparison

A practical pattern for understanding how Python evolves:

import dis
import sys

def swap(a, b):
    a, b = b, a
    return a, b

print(f"Python {sys.version_info[:2]}")
dis.dis(swap)
# Python 3.7: ROT_TWO instruction
# Python 3.12: SWAP 2 instruction
# Both do the same thing, different implementation

The co_linetable (Python 3.10+)

Python 3.10 replaced co_lnotab with co_linetable, using a more compact encoding for the line number table. Python 3.11 further extended this to co_lines() and added column information via co_positions():

import dis
import sys

def multiline():
    result = (
        some_func(a, b)
        + other_func(c)
    )

if sys.version_info >= (3, 11):
    code = multiline.__code__
    for offset, start_line, end_line, col, end_col in code.co_positions():
        print(f"  offset {offset}: "
              f"lines {start_line}-{end_line}, "
              f"cols {col}-{end_col}")

This precise position information powers the improved error messages in Python 3.11+ where the interpreter underlines the exact expression that caused an error.

Practical debugging with dis

When behavior is confusing, bytecode tells the truth:

import dis

# Why does this work?
def mysterious():
    x = [1, 2, 3]
    x += [4]     # Calls __iadd__ (mutates in place for lists)
    x = x + [5]  # Calls __add__ (creates new list)

dis.dis(mysterious)
# Shows BINARY_OP(+=) for +=, which uses INPLACE_ADD
# vs BINARY_OP(+) for +, which uses BINARY_ADD
# For lists they produce different bytecode paths

The one thing to remember: The dis module combined with dis.Bytecode for programmatic access reveals exactly what CPython does with your code — from stack operations and jump targets to the specializing optimizations that make Python 3.11+ significantly faster than earlier versions.

pythoninternalsdebugging