Python 3.13 New Features — Deep Dive

Technical overview

Python 3.13 (October 2024) shipped two experimental features with deep architectural implications: free threading (PEP 703) and a copy-and-patch JIT compiler (PEP 744). Both are disabled by default, but they represent the most fundamental changes to CPython’s execution model since the GIL was introduced in 1992.

Free-threaded CPython — architecture

The problem with removing the GIL

The GIL protected CPython from data races on:

  1. Reference counts — every object has ob_refcnt, incremented and decremented constantly
  2. Container internals — dicts, lists, and sets are not thread-safe data structures
  3. Memory allocationpymalloc is not thread-safe
  4. Global state — interpreter state, module registries, import machinery

Simply removing the GIL without replacing these protections would cause crashes and data corruption.

Biased reference counting

The most critical change. Each object has a thread-local reference count and a shared reference count:

// Simplified structure
struct _object {
    Py_ssize_t ob_tid;         // Owning thread ID
    uint16_t ob_flags;
    uint32_t ob_ref_local;     // Thread-local refcount (uncontended)
    Py_ssize_t ob_ref_shared;  // Shared refcount (atomic operations)
};
  • The owning thread updates ob_ref_local without atomics — fast path
  • Other threads update ob_ref_shared with atomic operations — slow path
  • When ob_ref_local drops to zero, the shared count is checked atomically
  • Object deallocation happens when both counts reach zero

This is “biased” because most reference count operations happen on the owning thread, keeping the fast path truly fast.

Per-object locking

Critical section locks protect container mutations:

// Internal API for container operations
Py_BEGIN_CRITICAL_SECTION(dict);
// ... mutate dict internals ...
Py_END_CRITICAL_SECTION();

These are lightweight mutexes that:

  • Use the object header (no separate allocation)
  • Support deadlock detection (critical sections are ordered)
  • Fall back to stop-the-world collection for complex cases

Immortal objects

Common objects (None, True, False, small integers, interned strings) are marked immortal — their reference count never changes:

// Immortal objects have a special refcount value
#define _Py_IMMORTAL_REFCNT ((Py_ssize_t)(UINT32_MAX >> 1))

This eliminates contention on the most frequently shared objects.

Deferred reference counting

Some objects (module globals, type objects) use deferred reference counting — their refcount decrements are batched and processed during GC pauses rather than inline. This reduces atomic operations for long-lived objects.

Performance characteristics

WorkloadGIL buildFree-threaded (1 thread)Free-threaded (4 threads)
pyperformance avg1.00×0.92×N/A (single-threaded)
CPU-bound parallel1.00×0.90×3.2×
I/O-bound parallel1.00×0.95×~1.0× (I/O bound)

Single-threaded code is 5-10% slower due to locking overhead. Multi-threaded CPU-bound work scales near-linearly with cores.

Copy-and-patch JIT compiler

Architecture

The JIT uses a technique called “copy and patch” from Haas et al. (2021):

  1. Stencil generation (build time): Each bytecode instruction is compiled to native machine code by Clang/LLVM, producing a “stencil” — a template with holes for operands
  2. Patching (runtime): When a hot trace is detected, stencils are copied into a buffer and holes are filled with concrete values (object pointers, offsets)
  3. Execution: The patched buffer is marked executable and called directly
Build time:                        Runtime:
┌─────────┐     ┌──────────┐      ┌──────────────────┐
│ C code  │ ──→ │ Stencils │ ──→  │ Copy + Patch     │
│ per     │     │ (.h data)│      │ concrete values  │
│ opcode  │     └──────────┘      └────────┬─────────┘
└─────────┘                                │
                                    ┌──────▼──────┐
                                    │ Executable  │
                                    │ native code │
                                    └─────────────┘

Why “copy and patch” instead of a traditional JIT?

  • Simpler: No IR, no register allocator, no instruction scheduler at runtime
  • Faster compilation: Copying and patching takes microseconds vs. milliseconds for LLVM-based JITs
  • Correct by construction: Each stencil is verified by Clang at build time
  • Maintainable: CPython developers write C, not assembly

Current limitations

  • Only traces of Tier 2 (optimised) bytecodes are JIT-compiled
  • No inlining across function boundaries (planned for 3.14)
  • No loop unrolling or constant folding beyond what the Tier 2 optimiser provides
  • Platform support: x86-64, AArch64 (ARM64)

Enabling and measuring

# Enable JIT
PYTHON_JIT=1 python3.13 script.py

# Check if JIT is available
python3.13 -c "import sys; print(sys._jit)"

# Disable for comparison
PYTHON_JIT=0 python3.13 script.py

New REPL implementation

_pyrepl internals

The new REPL is based on PyPy’s pyrepl, adapted for CPython:

Input → _pyrepl.reader → _pyrepl.commands → _pyrepl.console


                    _pyrepl.completing (tab completion)
                    _pyrepl.historical (history management)

Key differences from the old readline-based REPL:

  • Block-aware: Knows about Python indentation and multi-line constructs
  • Customisable: Supports custom key bindings via ~/.pyrepl_config (undocumented, may change)
  • No C dependency: Pure Python, no libreadline or libedit needed

Compatibility

The new REPL detects when stdin is not a terminal (piped input) and falls back to the classic REPL. It also respects PYTHONSTARTUP and IPython-style magic commands are not supported (use IPython for those).

Extension module compatibility

Free-threaded builds

Extension modules must declare thread-safety:

static struct PyModuleDef_Slot module_slots[] = {
    {Py_mod_multiple_interpreters, Py_MOD_PER_INTERPRETER_GIL_SUPPORTED},
    {Py_mod_gil, Py_MOD_GIL_NOT_USED},  // Declares GIL-free safety
    {0, NULL}
};

Without Py_MOD_GIL_NOT_USED, the interpreter re-enables the GIL when the module is imported.

Impact on package ecosystem

As of early 2026:

  • NumPy 2.1+ supports free-threaded builds
  • Cython 3.1+ can generate free-threaded compatible code
  • pybind11 2.13+ has experimental free-threaded support
  • Many smaller packages still need updates

locals() semantics (PEP 667)

The change from “sometimes a view, sometimes a copy” to “always a snapshot” required modifying frame.f_locals:

import sys

def example():
    x = 1
    frame = sys._getframe()
    frame.f_locals  # Now always returns a fresh snapshot
    x = 2
    frame.f_locals  # Another fresh snapshot with x=2

Debuggers and profilers that relied on mutating locals() to change variables must now use frame.f_locals writes directly (which does affect the frame).

Migration strategy

  1. Test with 3.13 default build — most code works unchanged
  2. Test with free-threaded build only if you have CPU-bound threading workloads
  3. Audit C extensions for global state and thread safety if targeting free-threaded
  4. Remove deprecated stdlib imports — 19 modules were removed
  5. Don’t depend on locals() mutation — ensure code works with snapshot semantics
  6. Try the JIT on benchmarks — measure, don’t assume improvement

The one thing to remember: Python 3.13 is the inflection point — biased reference counting and copy-and-patch JIT are the technical foundations for a Python that runs on all cores and approaches compiled-language speed over the next several releases.

pythonpython313release-features

See Also

  • Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
  • Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.
  • Python 312 New Features Python 3.12 made type hints shorter, f-strings more powerful, and started preparing Python's engine for a world without the GIL.
  • Python Exception Groups Python's ExceptionGroup is like getting one report card that lists every mistake at once instead of stopping at the first one.
  • Python Free Threading Nogil Python has always had a rule that only one thing can happen at a time — free threading finally changes that, like opening extra checkout lanes at the grocery store.