Code Objects Internals — Deep Dive

Dissect Python code objects at the byte level — bytecode layout, co_ attributes, the constants pool, and how CPython compiles scopes.

The Code Object in CPython’s Architecture

In CPython, the code object is represented by PyCodeObject (defined in Include/cpython/code.h). It is one of the most important internal types — every execution scope (module, function, class, comprehension) gets one. The object is immutable after creation, which enables caching in .pyc files and safe sharing across threads.

Complete Attribute Reference

Here is the full set of co_ attributes available in Python 3.12+:

def example(a, b=10, *args, key=None, **kwargs):
    x = a + b
    name = "test"
    return x

co = example.__code__

Argument metadata:

co_argcount — Positional argument count (2: a and b)
co_posonlyargcount — Positional-only arguments (0 here)
co_kwonlyargcount — Keyword-only arguments (1: key)
co_flags — Bitmask indicating properties like CO_VARARGS (0x04), CO_VARKEYWORDS (0x08), CO_GENERATOR (0x20), CO_COROUTINE (0x100)

Variable tables:

co_varnames — Local variables including arguments: ('a', 'b', 'args', 'key', 'kwargs', 'x', 'name')
co_cellvars — Variables captured by inner functions (closures)
co_freevars — Variables received from an enclosing scope
co_names — Global/attribute names referenced

Bytecode and constants:

co_code — The raw bytecode bytes
co_consts — Immutable constant pool: (10, None, 'test')
co_stacksize — Maximum evaluation stack depth

Source mapping:

co_filename — Source file path
co_name — Scope name (e.g., 'example')
co_qualname — Qualified name including nesting (Python 3.11+)
co_firstlineno — First line number
co_lnotab — Legacy line number table (deprecated)
co_linetable — New compact line table (Python 3.10+)

Bytecode Layout

The co_code attribute contains the actual instructions. In Python 3.6+, every instruction is exactly 2 bytes (word code): one byte for the opcode, one for the argument. Instructions that need larger arguments use EXTENDED_ARG prefixes.

import dis

def square(n):
    return n * n

dis.dis(square)
# Output (Python 3.12):
#   RESUME          0
#   LOAD_FAST       0 (n)
#   LOAD_FAST       0 (n)
#   BINARY_OP       5 (*)
#   RETURN_VALUE

The dis module decodes co_code and maps argument indices back to the variable tables. LOAD_FAST 0 means “load local variable at index 0 in co_varnames”, which is n.

The Constants Pool

co_consts is a tuple containing every literal value in the function: integers, floats, strings, bytes, None, True, False, tuples of constants, and nested code objects. The compiler deduplicates constants where possible — two identical string literals may share the same entry.

Nested code objects appear here because inner function definitions are themselves constants from the compiler’s perspective:

def outer():
    def inner():
        return 42
    return inner

# outer.__code__.co_consts contains inner's code object
code_objects = [c for c in outer.__code__.co_consts
                if isinstance(c, type(outer.__code__))]
print(code_objects[0].co_name)  # 'inner'
print(code_objects[0].co_consts)  # (None, 42)

Variable Scope Resolution

The compiler determines variable scope at compile time, not runtime. This decision is encoded in which table a name appears in:

co_varnames → LOAD_FAST / STORE_FAST (array index, fastest)
co_cellvars → LOAD_DEREF / STORE_DEREF (closure cell)
co_freevars → LOAD_DEREF (received from enclosing scope)
co_names → LOAD_GLOBAL / LOAD_ATTR (dictionary lookup, slowest)

This is why local variables are faster than globals — LOAD_FAST is an array index operation, while LOAD_GLOBAL involves a dictionary lookup in the module namespace.

def closure_example():
    count = 0
    def increment():
        nonlocal count
        count += 1
        return count
    return increment

outer_co = closure_example.__code__
print(outer_co.co_cellvars)  # ('count',)

inner_co = [c for c in outer_co.co_consts
            if hasattr(c, 'co_code')][0]
print(inner_co.co_freevars)  # ('count',)

Creating Code Objects Programmatically

You can construct code objects using types.CodeType, though the constructor signature is extensive:

import types

# A minimal code object that returns 42
bytecode = bytes([
    100, 0,   # LOAD_CONST 0 (42)
    83, 0,    # RETURN_VALUE
])

code = types.CodeType(
    0,              # argcount
    0,              # posonlyargcount
    0,              # kwonlyargcount
    0,              # nlocals
    1,              # stacksize
    0,              # flags
    bytecode,       # codestring
    (42,),          # constants
    (),             # names
    (),             # varnames
    '<dynamic>',    # filename
    'answer',       # name
    'answer',       # qualname
    1,              # firstlineno
    b'',            # linetable
    b'',            # exceptiontable
    (),             # freevars
    (),             # cellvars
)

func = types.FunctionType(code, {})
print(func())  # 42

The code.replace() method (Python 3.8+) is safer for modifying existing code objects — it copies all attributes and lets you override specific ones:

new_code = square.__code__.replace(co_name='square_v2')

The Line Number Table

Tracebacks, debuggers, and coverage tools need to map bytecode offsets back to source lines. Python 3.10 replaced co_lnotab with co_linetable, a more compact encoding that also supports column-level precision (Python 3.11+ adds co_positions() for exact column ranges).

# Python 3.11+
for pos in square.__code__.co_positions():
    print(pos)  # (lineno, end_lineno, col_offset, end_col_offset)

This granularity powers the precise error messages in Python 3.11+ that underline the exact expression that caused an error.

Exception Tables (Python 3.11+)

Python 3.11 introduced co_exceptiontable, replacing the old block stack mechanism. This table maps bytecode ranges to exception handlers, enabling zero-cost exception handling — no runtime overhead when exceptions are not raised.

`.pyc` Files: Serialized Code Objects

When Python imports a module, it serializes the module’s code object (and all nested code objects) using the marshal module, writing the result to __pycache__/<name>.cpython-3XX.pyc. The file contains a magic number (identifying the Python version), a timestamp/hash for invalidation, and the marshalled code object. On subsequent imports, Python loads the .pyc directly, skipping parsing and compilation entirely.

Security Implications

Code objects can be pickled and marshalled. Deserializing an untrusted code object is as dangerous as eval() — the bytecode can contain arbitrary instructions. Never load .pyc files or marshalled code from untrusted sources. The compile() built-in is safe (it only accepts source strings), but marshal.loads() can produce executable code objects from arbitrary bytes.

Practical Debugging Patterns

def inspect_code(func):
    co = func.__code__
    print(f"Name: {co.co_name}")
    print(f"Args: {co.co_argcount} positional, {co.co_kwonlyargcount} kw-only")
    print(f"Locals: {co.co_varnames}")
    print(f"Constants: {co.co_consts}")
    print(f"Stack size: {co.co_stacksize}")
    print(f"Flags: {co.co_flags:#06x}")
    print(f"Bytecode: {co.co_code.hex()}")

This kind of introspection is invaluable when debugging import hooks, understanding optimizer behavior, or building code analysis tools.

One thing to remember: The code object is the bridge between your source code and the interpreter’s execution engine. Understanding its structure — the bytecode, the constants pool, the variable tables, and the scope resolution rules — gives you insight into Python’s performance characteristics and opens the door to advanced metaprogramming, debugging, and tooling.

pythoncompiler-internalslanguage-implementation