Code Generation Patterns — Deep Dive

Advanced code generation in Python — AST-based generators, exec() compilation tricks, import hooks, and production patterns from real frameworks.

How Python Frameworks Generate Code

Code generation is not an exotic technique — it is fundamental to many of the most popular Python libraries. Understanding how they do it reveals patterns you can apply in your own projects.

dataclasses: String Compilation

The dataclasses module generates __init__, __repr__, __eq__, and other methods by building Python source code as strings and compiling them:

# Simplified version of what dataclasses does internally
def _create_init(fields):
    args = ', '.join(f'{f.name}: {f.type.__name__}' for f in fields)
    body = '\n'.join(f'  self.{f.name} = {f.name}' for f in fields)
    source = f'def __init__(self, {args}):\n{body}'

    # Compile and execute in a controlled namespace
    ns = {}
    exec(compile(source, '<generated>', 'exec'), ns)
    return ns['__init__']

Why strings instead of AST? The generated source preserves argument names and type annotations in a way that inspection tools (inspect.signature(), IDEs) can read. The exec() is safe here because the input is entirely controlled by the framework, not user data.

SQLAlchemy: Metaclass-Driven Generation

SQLAlchemy’s declarative base uses a metaclass that inspects Column descriptors and generates mapping code:

class UserMeta(type):
    def __new__(mcs, name, bases, namespace):
        columns = {k: v for k, v in namespace.items()
                   if isinstance(v, Column)}
        cls = super().__new__(mcs, name, bases, namespace)
        cls._columns = columns
        # Generate query methods, relationship loaders, etc.
        return cls

The metaclass intercepts class creation and augments it with ORM functionality. No source code is written — everything happens through object manipulation.

Pydantic v2: Rust-Generated Validators

Pydantic v2 takes a hybrid approach: it uses Rust (via pydantic-core) to generate optimized validation functions at model creation time. The Python class definition is analyzed, and a Rust-based schema compiler produces a validation plan that runs at near-native speed.

AST-Based Generation in Depth

For cases where you need programmatic code construction with guaranteed syntactic correctness, AST building is the right tool.

Building Complex Functions

import ast
import types

def generate_validator(field_name, field_type, constraints):
    """Generate a validation function for a single field."""
    checks = []

    if 'min' in constraints:
        checks.append(ast.If(
            test=ast.Compare(
                left=ast.Name(id='value', ctx=ast.Load()),
                ops=[ast.Lt()],
                comparators=[ast.Constant(value=constraints['min'])]
            ),
            body=[ast.Raise(exc=ast.Call(
                func=ast.Name(id='ValueError', ctx=ast.Load()),
                args=[ast.Constant(
                    value=f"{field_name} must be >= {constraints['min']}"
                )],
                keywords=[]
            ))],
            orelse=[]
        ))

    if 'max_length' in constraints and field_type == str:
        checks.append(ast.If(
            test=ast.Compare(
                left=ast.Call(
                    func=ast.Name(id='len', ctx=ast.Load()),
                    args=[ast.Name(id='value', ctx=ast.Load())],
                    keywords=[]
                ),
                ops=[ast.Gt()],
                comparators=[ast.Constant(value=constraints['max_length'])]
            ),
            body=[ast.Raise(exc=ast.Call(
                func=ast.Name(id='ValueError', ctx=ast.Load()),
                args=[ast.Constant(
                    value=f"{field_name} exceeds max length {constraints['max_length']}"
                )],
                keywords=[]
            ))],
            orelse=[]
        ))

    checks.append(ast.Return(
        value=ast.Name(id='value', ctx=ast.Load())
    ))

    func = ast.FunctionDef(
        name=f'validate_{field_name}',
        args=ast.arguments(
            posonlyargs=[], args=[ast.arg(arg='value')],
            vararg=None, kwonlyargs=[], kw_defaults=[],
            kwarg=None, defaults=[]
        ),
        body=checks or [ast.Pass()],
        decorator_list=[], returns=None,
        lineno=1, col_offset=0
    )

    module = ast.Module(body=[func], type_ignores=[])
    ast.fix_missing_locations(module)
    code = compile(module, f'<validator:{field_name}>', 'exec')
    ns = {}
    exec(code, ns)
    return ns[f'validate_{field_name}']

# Usage
validate_age = generate_validator('age', int, {'min': 0})
validate_name = generate_validator('name', str, {'max_length': 100})

Preserving Source Information

Generated code often produces unhelpful stack traces. You can improve this by setting meaningful co_filename and line numbers:

# When compiling, use a descriptive filename
code = compile(tree, 'generated://UserModel/validate_age', 'exec')

# Stack traces will show:
#   File "generated://UserModel/validate_age", line 3

For ast.unparse()-based generation, write the source to a temporary file and compile from there — this gives debuggers and coverage tools real source to display.

Import Hooks for Transparent Generation

Python’s import system supports hooks that can intercept module loading and inject generated code:

import importlib.abc
import importlib.machinery
import sys

class SchemaLoader(importlib.abc.Loader):
    def __init__(self, schema_path):
        self.schema_path = schema_path

    def create_module(self, spec):
        return None  # use default module creation

    def exec_module(self, module):
        schema = load_schema(self.schema_path)
        for model_name, fields in schema.items():
            cls = generate_model_class(model_name, fields)
            setattr(module, model_name, cls)

class SchemaFinder(importlib.abc.MetaPathFinder):
    def find_spec(self, fullname, path, target=None):
        if fullname.startswith('generated_models.'):
            schema_name = fullname.split('.')[-1]
            schema_path = f'schemas/{schema_name}.yaml'
            if os.path.exists(schema_path):
                return importlib.machinery.ModuleSpec(
                    fullname,
                    SchemaLoader(schema_path)
                )
        return None

sys.meta_path.insert(0, SchemaFinder())

# Now you can do:
# from generated_models.users import User

This pattern makes generated code feel like regular imports while keeping the generation logic centralized.

The exec() Compilation Pattern

When frameworks use exec() for code generation, they follow a specific safe pattern:

def _compile_method(name, source, globals_dict=None):
    """Compile a method from trusted source with proper metadata."""
    # 1. Source is NEVER from user input
    # 2. Compile first to catch syntax errors
    code = compile(source, f'<generated:{name}>', 'exec')

    # 3. Execute in an isolated namespace
    local_ns = {}
    exec(code, globals_dict or {}, local_ns)

    # 4. Extract and return the function
    func = local_ns[name]

    # 5. Optionally set qualname for better debugging
    func.__qualname__ = f'Generated.{name}'
    return func

The critical safety property: the source string is constructed entirely by the framework from trusted data (class definitions, annotations, schema files). User-provided strings never reach exec().

Template Engines for Code Generation

Jinja2 is the most common template engine for code generation. Best practices for using it:

from jinja2 import Environment, FileSystemLoader

env = Environment(
    loader=FileSystemLoader('templates/'),
    trim_blocks=True,
    lstrip_blocks=True,
    keep_trailing_newline=True,
    undefined=StrictUndefined,  # fail fast on missing variables
)

template = env.get_template('model.py.j2')
source = template.render(
    class_name='User',
    fields=[
        {'name': 'id', 'type': 'int', 'required': True},
        {'name': 'email', 'type': 'str', 'required': True},
        {'name': 'bio', 'type': 'str', 'required': False},
    ]
)

# Validate the generated code compiles
compile(source, '<generated>', 'exec')

# Write to file
Path('src/models/user.py').write_text(source)

Key practices:

Use StrictUndefined to catch template variable errors
Compile the output to verify syntax before writing
Run formatters (Black, isort) on generated code
Include a “DO NOT EDIT — generated by X” header

Testing Generated Code

Generated code needs testing at two levels:

Generator tests verify the generator produces correct output for known inputs:

def test_model_generator():
    source = generate_model("User", [("name", str), ("age", int)])
    # Verify it compiles
    code = compile(source, "<test>", "exec")
    # Verify it runs
    ns = {}
    exec(code, ns)
    user = ns["User"](name="Alice", age=30)
    assert user.name == "Alice"

Snapshot tests catch unintended changes to generated output:

def test_model_snapshot(snapshot):
    source = generate_model("User", [("name", str), ("age", int)])
    assert source == snapshot  # compare against stored snapshot

Integration tests verify the generated code works correctly in the full application context — imports resolve, types are correct, serialization works.

Performance Considerations

Code generation adds startup cost but can improve runtime performance:

Specialized methods generated for specific types avoid runtime type checking
Inlined logic avoids function call overhead for simple operations
Pre-computed dispatch tables replace dynamic lookups

However, excessive code generation can slow down imports (many exec() calls during module loading) and increase memory usage (many code objects). Profile before assuming generation is faster.

Anti-Patterns to Avoid

Generating code from user input — This is eval() with extra steps. Always generate from trusted schemas or definitions.

Generating code that generates code — Meta-meta-programming is almost always a sign that you need a better abstraction, not another layer of generation.

Generating unformatted code — Always run generated code through a formatter. Future developers will need to read it for debugging.

Not documenting the generation pipeline — If files are generated, document which tool generates them, from what source, and how to regenerate.

One thing to remember: Code generation in Python ranges from simple string templates to sophisticated AST construction and import hooks. The best frameworks use it to eliminate boilerplate while keeping the generated code inspectable and debuggable. The key safety principle is absolute: generate from trusted definitions only, never from user input. And always make it easy for the next developer to understand what was generated, why, and how to regenerate it.

pythonmetaprogramminglanguage-implementation