Python Deferred Computation — Deep Dive

Generator Protocol Internals

Generators implement the iterator protocol through a suspended frame object. When Python encounters yield, it:

  1. Saves the current execution frame (local variables, instruction pointer, exception state).
  2. Returns the yielded value to the caller.
  3. Suspends execution — the generator’s frame stays on the heap, not the call stack.
  4. On next(), restores the frame and resumes from the instruction after yield.

This frame suspension is key to understanding generator memory behavior. A generator holding references to large objects in its local scope keeps those objects alive as long as the generator exists:

def process_chunks(filepath):
    with open(filepath, 'rb') as f:
        while True:
            chunk = f.read(1_048_576)  # 1 MB
            if not chunk:
                break
            yield transform(chunk)
    # File handle closed here, but only when generator is exhausted or closed

gen = process_chunks("huge_file.bin")
next(gen)  # File is now open and stays open
# ... if we forget about gen, the file handle leaks until GC collects it

Always call .close() on generators that manage resources, or use them within context managers.

Generator Send and Throw

Generators support bidirectional communication through .send() and .throw():

def accumulator():
    total = 0
    while True:
        value = yield total
        if value is None:
            break
        total += value

acc = accumulator()
next(acc)          # Prime the generator → yields 0
acc.send(10)       # → yields 10
acc.send(20)       # → yields 30
acc.send(5)        # → yields 35

This turns generators into coroutines that can receive input at each suspension point — the foundation of asyncio before async/await syntax existed.

Building a Lazy Evaluation Framework

For complex deferred computation, you can build a Lazy wrapper that delays any computation:

import threading

class Lazy:
    """Thread-safe lazy evaluation wrapper."""
    _SENTINEL = object()

    def __init__(self, func, *args, **kwargs):
        self._func = func
        self._args = args
        self._kwargs = kwargs
        self._result = self._SENTINEL
        self._lock = threading.Lock()
        self._exception = None

    @property
    def value(self):
        if self._result is self._SENTINEL:
            with self._lock:
                if self._result is self._SENTINEL:
                    try:
                        self._result = self._func(
                            *self._args, **self._kwargs
                        )
                    except Exception as e:
                        self._exception = e
                        raise
        if self._exception:
            raise self._exception
        return self._result

    def is_evaluated(self):
        return self._result is not self._SENTINEL

# Usage
config = Lazy(load_config_from_remote, "https://config.example.com")
# No HTTP call yet

if needs_config:
    print(config.value)  # HTTP call happens now, result cached
    print(config.value)  # Returns cached result instantly

This double-checked locking pattern ensures thread safety while avoiding lock acquisition on subsequent accesses.

Lazy Module Loading

Python 3.12+ supports lazy imports through importlib:

import importlib
import importlib.util

def lazy_import(name):
    """Import a module lazily — actual loading deferred until attribute access."""
    spec = importlib.util.find_spec(name)
    loader = importlib.util.LazyLoader(spec.loader)
    spec.loader = loader
    module = importlib.util.module_from_spec(spec)
    spec.loader.exec_module(module)
    return module

# numpy isn't actually loaded until you access an attribute
np = lazy_import("numpy")
type(np)  # <class 'module'> — but numpy code hasn't executed yet
np.array([1, 2, 3])  # NOW numpy fully loads

Instagram’s Python server uses lazy imports extensively. They reported a 60% reduction in startup time by deferring imports of modules that aren’t needed on every request path.

PEP 690: Lazy Imports

PEP 690 proposes a -L flag to make all imports lazy by default. While not yet accepted into CPython, the concept is used in Meta’s internal Python runtime (Cinder) and has demonstrated significant improvements:

  • Server startup time reduced from 12s to 4s at Instagram scale
  • Memory usage reduced by 40% for CLI tools that import large frameworks but only use a subset

Deferred Descriptor Chains

Descriptors enable per-attribute deferred computation with fine-grained control:

class LazyDescriptor:
    def __init__(self, func):
        self.func = func
        self.attr_name = f"_lazy_{func.__name__}"

    def __set_name__(self, owner, name):
        self.attr_name = f"_lazy_{name}"

    def __get__(self, obj, objtype=None):
        if obj is None:
            return self
        try:
            return getattr(obj, self.attr_name)
        except AttributeError:
            value = self.func(obj)
            setattr(obj, self.attr_name, value)
            return value

    def __delete__(self, obj):
        try:
            delattr(obj, self.attr_name)
        except AttributeError:
            pass

class DataPipeline:
    def __init__(self, raw_data):
        self.raw_data = raw_data

    @LazyDescriptor
    def cleaned(self):
        print("Cleaning data...")
        return [x.strip().lower() for x in self.raw_data]

    @LazyDescriptor
    def tokenized(self):
        print("Tokenizing...")
        return [s.split() for s in self.cleaned]

    @LazyDescriptor
    def vocabulary(self):
        print("Building vocabulary...")
        words = set()
        for tokens in self.tokenized:
            words.update(tokens)
        return sorted(words)

pipeline = DataPipeline(["  Hello World  ", "Python IS Great  "])
# Nothing computed yet

print(pipeline.vocabulary)
# Prints: Cleaning data... Tokenizing... Building vocabulary...
# Then the sorted vocabulary

print(pipeline.vocabulary)
# Returns cached result — no recomputation

The __delete__ method allows cache invalidation — del pipeline.cleaned forces recomputation on next access, cascading through the dependency chain.

Deferred Iteration with itertools Recipes

Advanced deferred pipelines combine multiple itertools functions:

import itertools
from typing import Iterator, TypeVar

T = TypeVar('T')

def lazy_batch(iterable: Iterator[T], size: int) -> Iterator[list[T]]:
    """Lazily batch items without loading all into memory."""
    it = iter(iterable)
    while True:
        batch = list(itertools.islice(it, size))
        if not batch:
            break
        yield batch

def lazy_flatmap(func, iterable):
    """Lazily apply func to each item and flatten results."""
    return itertools.chain.from_iterable(map(func, iterable))

def lazy_deduplicate(iterable, key=None):
    """Lazily deduplicate, preserving order, streaming."""
    seen = set()
    for item in iterable:
        k = key(item) if key else item
        if k not in seen:
            seen.add(k)
            yield item

# Compose a fully deferred pipeline
raw_lines = open("access.log")  # Lazy file iteration
parsed = map(parse_log_line, raw_lines)
errors = filter(lambda r: r.status >= 500, parsed)
unique_ips = lazy_deduplicate(errors, key=lambda r: r.ip)
batches = lazy_batch(unique_ips, 100)

for batch in batches:
    alert_service.send(batch)

This pipeline processes a multi-gigabyte log file with constant memory usage. Each line flows through the entire pipeline before the next line is read.

Performance: Eager vs Deferred Tradeoffs

Benchmarking on a list of 10 million integers, computing squares and filtering evens:

import time

data = range(10_000_000)

# Eager: list comprehensions
start = time.perf_counter()
squares = [x**2 for x in data]
evens = [x for x in squares if x % 2 == 0]
result_eager = evens[:10]
eager_time = time.perf_counter() - start
# ~3.2s, ~400 MB peak memory

# Deferred: generator pipeline
start = time.perf_counter()
squares = (x**2 for x in data)
evens = (x for x in squares if x % 2 == 0)
result_lazy = list(itertools.islice(evens, 10))
lazy_time = time.perf_counter() - start
# ~0.00003s, ~0 MB peak memory

When only 10 results are needed, the deferred approach is 100,000x faster because it processes exactly 10 items instead of 10 million.

But if all results are consumed:

# Eager: ~3.2s for full list
# Deferred: ~4.1s to iterate all generators (frame save/restore overhead)

Generators add ~20-30% overhead per item compared to list comprehensions when all items are consumed. The choice depends on whether you’re processing subsets or everything.

async Generators for Deferred I/O

Async generators combine deferred computation with non-blocking I/O:

async def fetch_pages(base_url, max_pages=100):
    """Lazily fetch paginated API results."""
    async with httpx.AsyncClient() as client:
        for page in range(1, max_pages + 1):
            response = await client.get(
                f"{base_url}?page={page}"
            )
            data = response.json()
            if not data["results"]:
                break
            for item in data["results"]:
                yield item

# Only fetches pages as items are consumed
async for user in fetch_pages("https://api.example.com/users"):
    if user["role"] == "admin":
        await notify(user)
        break  # Stops fetching further pages

This pattern is used in production API clients, database cursors (asyncpg), and streaming data pipelines.

The one thing to remember: Deferred computation in Python spans from simple generators to sophisticated lazy descriptor chains and async generators — the key architectural decision is identifying which computations might not be needed and deferring exactly those, while keeping eagerly-evaluated the hot paths that always run to completion.

pythonperformanceoptimizationpatternsinternals

See Also