Python JIT Compiler (Copy and Patch) — Core Concepts
What is Python’s JIT compiler?
Python 3.13 introduced an experimental JIT (Just-In-Time) compiler that converts frequently executed bytecode into native machine code at runtime. It uses a technique called “copy and patch” that’s simpler and more maintainable than traditional JIT architectures.
How Python runs code (the tiers)
Before understanding the JIT, you need to know the execution pipeline:
Tier 1: The bytecode interpreter
Your .py file is compiled to bytecode (.pyc). The interpreter executes bytecodes one at a time. This is how all Python ran before 3.11.
Tier 1.5: Adaptive specialisation (3.11+)
After an instruction runs 8 times with the same types, it gets replaced with a specialised variant. BINARY_OP becomes BINARY_OP_ADD_INT. This gave the 25% speedup in 3.11.
Tier 2: The micro-op optimiser (3.13+)
When a hot code path is detected, the interpreter builds a “trace” — a linear sequence of micro-operations (uops) from the specialised bytecodes. The trace is optimised (dead code removal, guard elimination).
Tier 3: JIT compilation (3.13+, experimental)
The optimised trace is compiled to native machine code using copy-and-patch.
The copy-and-patch technique
Build time
Each micro-operation is written in C and compiled by Clang/LLVM into a machine code template called a “stencil.” The stencil has holes where runtime-specific values need to go (object pointers, jump targets).
Example stencil for BINARY_OP_ADD_INT:
┌────────────────────────────────────────────┐
│ mov rax, [HOLE_1] ← pointer to left │
│ mov rbx, [HOLE_2] ← pointer to right │
│ add rax, rbx │
│ mov [HOLE_3], rax ← result destination │
│ jmp HOLE_4 ← next stencil │
└────────────────────────────────────────────┘
The stencils are stored as data arrays in the CPython build.
Runtime
When a trace is compiled:
- Allocate an executable memory region
- For each uop in the trace, copy its stencil into the region
- Patch the holes with concrete values
- Mark the memory as executable
- On next execution, jump directly to the native code
Why this approach?
| Approach | Compile speed | Code quality | Complexity |
|---|---|---|---|
| Interpreter | N/A | Baseline | Low |
| Method JIT (JVM, V8) | Slow (~ms) | Excellent | Very high |
| Tracing JIT (LuaJIT) | Medium | Very good | High |
| Copy-and-patch | Very fast (~µs) | Good | Low |
Copy-and-patch trades peak code quality for simplicity. The CPython team (mostly volunteers) can maintain it without JIT compiler expertise. Each stencil is verified by a production compiler (Clang), so correctness is inherited.
Performance impact
Current benchmarks show modest gains:
| Benchmark | Speedup |
|---|---|
| richards | 1.05× |
| nbody | 1.09× |
| spectral_norm | 1.04× |
| json_loads | 1.02× |
| pyperformance avg | 1.02-1.05× |
The modest numbers are expected — this is the foundation, not the finish line. Future optimisations planned:
- Function inlining — eliminating call overhead for small functions
- Constant folding — computing known values at compile time
- Register allocation — reducing memory loads/stores
- Guard elimination — removing redundant type checks
How to use it
# Enable the JIT (disabled by default in 3.13)
PYTHON_JIT=1 python3.13 my_script.py
# Check if JIT is available in your build
python3.13 -c "import sys; print(hasattr(sys, '_jit'))"
The JIT requires a CPython build compiled with --enable-experimental-jit. Pre-built binaries from python.org include it on x86-64 and AArch64.
What doesn’t get JIT-compiled
- C extension calls — calls into NumPy, etc., bypass the JIT
- Short-running code — code that doesn’t execute enough to trigger tracing
- Exception-heavy paths — traces are invalidated on exceptions
- Eval/exec — dynamically compiled code goes through the interpreter
Common misconception
“Python finally has a JIT like Java or JavaScript, so it’ll be as fast.” The V8 and HotSpot JITs have decades of optimisation work and use far more complex techniques (on-stack replacement, speculative optimisation, escape analysis). Python’s copy-and-patch is intentionally simpler. It won’t close the performance gap with compiled languages overnight — but it’s the right architectural step to get there incrementally.
The one thing to remember: Copy-and-patch gives Python a JIT that’s simple enough for a volunteer team to maintain, fast enough to compile in microseconds, and correct by construction — it’s a platform for the next decade of Python performance work.
See Also
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.
- Python 312 New Features Python 3.12 made type hints shorter, f-strings more powerful, and started preparing Python's engine for a world without the GIL.
- Python 313 New Features Python 3.13 finally lets multiple tasks run at the same time for real, added a speed booster engine, and gave the interactive prompt a colourful makeover.
- Python Exception Groups Python's ExceptionGroup is like getting one report card that lists every mistake at once instead of stopping at the first one.