Bytecode Manipulation — Core Concepts

Understand Python bytecode — from the dis module and code objects to practical techniques for inspecting and modifying compiled instructions.

What Is Python Bytecode?

Python bytecode is the intermediate representation that CPython’s compiler produces from your source code. It is a sequence of instructions for CPython’s stack-based virtual machine. Each instruction consists of an opcode (operation code) and optionally an argument.

The compilation pipeline: source code → AST → bytecode → execution by the interpreter loop.

Inspecting Bytecode with `dis`

The dis module disassembles bytecode into human-readable form:

import dis

def greet(name):
    return f"Hello, {name}!"

dis.dis(greet)

Output:

  2           0 LOAD_CONST               1 ('Hello, ')
              2 LOAD_FAST                0 (name)
              4 FORMAT_VALUE             0
              6 LOAD_CONST               2 ('!')
              8 BUILD_STRING             3
             10 RETURN_VALUE

Each line shows: line number, byte offset, opcode name, argument index, and (in parentheses) the resolved argument value.

Code Objects

Every function in Python has a code object accessible via func.__code__. Code objects are immutable and contain everything needed to execute the function:

code = greet.__code__

code.co_code        # raw bytecode bytes
code.co_consts      # constant values used by the function
code.co_varnames    # local variable names
code.co_names       # names used (globals, attributes)
code.co_stacksize   # maximum stack depth needed
code.co_filename    # source file
code.co_firstlineno # first line number

The co_code attribute contains the raw bytes. In Python 3.6+, each instruction is exactly 2 bytes (opcode + argument), a format called wordcode. Earlier versions used variable-length instructions.

The Stack Machine Model

CPython’s bytecode runs on a stack machine. There are no registers — all values are pushed onto and popped from a stack:

# Source: x + y * 2
# Bytecode equivalent:
LOAD_FAST    x       # stack: [x]
LOAD_FAST    y       # stack: [x, y]
LOAD_CONST   2       # stack: [x, y, 2]
BINARY_MULTIPLY      # stack: [x, y*2]
BINARY_ADD           # stack: [x + y*2]

Each operation pops its inputs from the stack and pushes its result back. LOAD_* pushes values, arithmetic ops consume the top items and push the result.

Common Opcodes

Opcode	What It Does
`LOAD_FAST`	Push a local variable onto the stack
`STORE_FAST`	Pop the stack into a local variable
`LOAD_CONST`	Push a constant value
`LOAD_GLOBAL`	Push a global variable
`CALL_FUNCTION`	Call a function (Python <3.12)
`CALL`	Call a callable (Python 3.12+)
`RETURN_VALUE`	Return the top of stack
`POP_JUMP_IF_FALSE`	Conditional jump
`BINARY_OP`	Arithmetic/bitwise operation (3.12+)

Note: opcodes change between Python versions. Python 3.12 significantly reorganized many opcodes.

Modifying Bytecode

Since code objects are immutable, you modify bytecode by creating a new code object with altered attributes using code.replace() (Python 3.8+):

import types

def add(a, b):
    return a + b

# Get the original code
original = add.__code__

# Create modified code (change constant values)
new_consts = tuple(
    c * 2 if isinstance(c, int) else c
    for c in original.co_consts
)

# Replace the code object
add.__code__ = original.replace(co_consts=new_consts)

For bytecode-level changes, you would modify co_code bytes directly — but this requires understanding the exact byte layout and keeping related attributes (constants, names, stack size) consistent.

Practical Uses

Profiling and coverage: Tools like coverage.py instrument bytecode to track which lines execute.

Debugging: The sys.settrace hook works at the bytecode level, called before each line.

Optimization: Libraries like numba analyze bytecode to understand what a function does before JIT-compiling it.

Testing: Some mocking libraries modify bytecode to redirect function calls.

Common Misconception

Developers often think bytecode is like machine code (compiled C or assembly). Python bytecode is much higher-level — it still references variable names, function objects, and Python-level operations. It is an intermediate format designed for CPython’s interpreter, not for a CPU. Different Python implementations (PyPy, Jython) may not use this bytecode format at all.

One thing to remember: Python bytecode is a sequence of stack-machine instructions stored in code objects — you can inspect it with dis and modify it by creating new code objects with code.replace(), but the bytecode format changes between Python versions.

pythoncompiler-internalsbytecode

Bytecode Manipulation — Core Concepts

What Is Python Bytecode?

Inspecting Bytecode with dis