Python 3.11 New Features — Deep Dive

Inside CPython's adaptive specialisation, the new exception group semantics, TaskGroup internals, and benchmarking real-world 3.11 speedups.

Technical overview

Python 3.11 (October 2022) was the most performance-focused CPython release ever. The Faster CPython project delivered adaptive interpreter specialisation, and the language gained structured concurrency primitives and exception groups. This deep dive covers the internals that make it all work.

Adaptive interpreter specialisation

How it works

CPython 3.11’s interpreter monitors bytecode execution. After a bytecode instruction runs 8 times with the same types, the interpreter replaces it with a specialised variant:

BINARY_OP → BINARY_OP_ADD_INT when both operands are int
LOAD_ATTR → LOAD_ATTR_INSTANCE_VALUE when the object layout is known
CALL → CALL_PY_EXACT_ARGS when calling a Python function with matching arity

If the type assumption breaks (a different type appears), it de-specialises back to the generic instruction with minimal overhead.

Quickening

Specialisation happens through “quickening” — the bytecode is mutated in-place in a copy of the code object. Key details:

import dis
import sys

def add(a, b):
    return a + b

# Call enough times to trigger specialisation
for _ in range(10):
    add(1, 2)

# Show adaptive bytecodes
dis.dis(add, adaptive=True)
# Output shows BINARY_OP_ADD_INT instead of BINARY_OP

The adaptive=True flag in dis.dis is new in 3.11 and reveals the specialised opcodes.

Measured improvements

The pyperformance benchmark suite showed:

Benchmark	Speedup
richards	1.26×
raytrace	1.21×
regex_compile	1.19×
json_loads	1.09×
django_template	1.26×
async_tree_io	1.55×
spectral_norm	1.08×

I/O-bound workloads see smaller gains. Pure computation and template rendering benefit most.

Memory impact

Specialisation uses copy-on-write code objects, adding ~2-5% memory overhead. For most applications this is negligible. Memory-constrained environments can disable quickening with PYTHONDONTWRITEBYTECODE=1, though this also disables .pyc caching.

Zero-cost exception handling

In Python ≤3.10, entering a try block had a small overhead — the interpreter pushed a block to the exception handling stack. In 3.11, try blocks with no raised exception have zero runtime cost. The exception handler information is stored in a static table attached to the code object, not on the runtime stack.

import timeit

# This is essentially free in 3.11
def with_try():
    try:
        x = 1 + 2
    except Exception:
        pass
    return x

# Virtually identical performance
def without_try():
    x = 1 + 2
    return x

This makes defensive try/except blocks in hot paths cost-free.

Exception groups — semantics and internals

The `ExceptionGroup` class hierarchy

BaseException
├── BaseExceptionGroup  (can contain BaseException subclasses)
│   └── ExceptionGroup  (can only contain Exception subclasses)
├── KeyboardInterrupt
└── Exception
    └── ExceptionGroup

BaseExceptionGroup handles KeyboardInterrupt and SystemExit. ExceptionGroup (a subclass) only wraps Exception subclasses.

`except*` semantics in detail

try:
    raise ExceptionGroup("errors", [
        ValueError("a"),
        TypeError("b"),
        ValueError("c"),
    ])
except* ValueError as eg:
    print(f"Caught {len(eg.exceptions)} ValueErrors")
    # eg is ExceptionGroup("errors", [ValueError("a"), ValueError("c")])
except* TypeError as eg:
    print(f"Caught {len(eg.exceptions)} TypeErrors")

Critical rules:

All except* clauses are checked — they don’t short-circuit like except
The same exception cannot match two except* clauses
Unmatched exceptions propagate as a new ExceptionGroup
You cannot mix except and except* in the same try block
Re-raising in except* re-raises only the matched sub-group

`ExceptionGroup.subgroup()` and `.split()`

eg = ExceptionGroup("all", [ValueError("x"), TypeError("y"), ValueError("z")])

# subgroup: returns matching exceptions only
val_errors = eg.subgroup(ValueError)
# ExceptionGroup("all", [ValueError("x"), ValueError("z")])

# split: returns (match, rest)
match, rest = eg.split(TypeError)
# match = ExceptionGroup("all", [TypeError("y")])
# rest = ExceptionGroup("all", [ValueError("x"), ValueError("z")])

These methods preserve the nesting structure — nested ExceptionGroup instances are recursively filtered.

`asyncio.TaskGroup` internals

TaskGroup is built on ExceptionGroup:

async with asyncio.TaskGroup() as tg:
    task1 = tg.create_task(coro1())
    task2 = tg.create_task(coro2())

Implementation details:

On __aexit__, if any task raised, remaining tasks are cancelled via task.cancel()
The group waits for all tasks to finish (including cancelled ones)
All exceptions are collected into an ExceptionGroup
If a single task raised, you still get an ExceptionGroup with one exception (consistent interface)

This is structured concurrency — the lifetime of spawned tasks is bound to the async with block. No orphaned tasks.

Migration from `gather()`

# Old pattern (silent failure risk)
results = await asyncio.gather(coro1(), coro2(), return_exceptions=True)
for r in results:
    if isinstance(r, Exception):
        handle(r)

# New pattern (exceptions cannot be silently ignored)
try:
    async with asyncio.TaskGroup() as tg:
        tg.create_task(coro1())
        tg.create_task(coro2())
except* ConnectionError:
    handle_connection_failures()
except* ValueError:
    handle_validation_failures()

`tomllib` implementation notes

tomllib is a vendored copy of tomli (by Taneli Hukkinen). It’s a spec-compliant TOML v1.0.0 parser. Key characteristics:

Read-only by design — writing TOML is a separate concern with different trade-offs
Requires binary mode ("rb") because TOML is always UTF-8
Returns native Python types: dict, list, int, float, str, bool, datetime.datetime, datetime.date, datetime.time

TypeVarTuple (PEP 646) — variadic generics

Enables typing for variadic structures like tensors:

from typing import TypeVarTuple, Generic, Unpack

Ts = TypeVarTuple("Ts")

class Tensor(Generic[*Ts]):
    def __init__(self, *shape: Unpack[Ts]) -> None:
        self.shape = shape

# Type checkers can verify shape compatibility
def matrix_multiply(
    a: Tensor[int, int],
    b: Tensor[int, int]
) -> Tensor[int, int]:
    ...

This was initially designed for NumPy/PyTorch tensor shape checking but has broader applications for any variadic generic container.

Migration strategy

Benchmark first — run your test suite under 3.11 and measure wall-clock time improvement
Search for collections.Callable — was removed; use collections.abc.Callable
Adopt TaskGroup over gather() in new async code

Replace tomli with tomllib — use conditional import for 3.10 compat:

try:
    import tomllib
except ModuleNotFoundError:
    import tomli as tomllib

Add except* only where you genuinely handle concurrent failures — don’t use it as a fancier except

The one thing to remember: Python 3.11’s adaptive specialisation proved that a 25% speedup was achievable without a JIT compiler — and exception groups finally gave Python the concurrent error handling model it needed.

pythonpython311release-features

Python 3.11 New Features — Deep Dive

Technical overview

Adaptive interpreter specialisation

How it works

Quickening

Measured improvements

Memory impact

Zero-cost exception handling

Exception groups — semantics and internals

The ExceptionGroup class hierarchy

except* semantics in detail

ExceptionGroup.subgroup() and .split()

asyncio.TaskGroup internals

Migration from gather()

tomllib implementation notes