Code Generation Patterns — Deep Dive
How Python Frameworks Generate Code
Code generation is not an exotic technique — it is fundamental to many of the most popular Python libraries. Understanding how they do it reveals patterns you can apply in your own projects.
dataclasses: String Compilation
The dataclasses module generates __init__, __repr__, __eq__, and other methods by building Python source code as strings and compiling them:
# Simplified version of what dataclasses does internally
def _create_init(fields):
args = ', '.join(f'{f.name}: {f.type.__name__}' for f in fields)
body = '\n'.join(f' self.{f.name} = {f.name}' for f in fields)
source = f'def __init__(self, {args}):\n{body}'
# Compile and execute in a controlled namespace
ns = {}
exec(compile(source, '<generated>', 'exec'), ns)
return ns['__init__']
Why strings instead of AST? The generated source preserves argument names and type annotations in a way that inspection tools (inspect.signature(), IDEs) can read. The exec() is safe here because the input is entirely controlled by the framework, not user data.
SQLAlchemy: Metaclass-Driven Generation
SQLAlchemy’s declarative base uses a metaclass that inspects Column descriptors and generates mapping code:
class UserMeta(type):
def __new__(mcs, name, bases, namespace):
columns = {k: v for k, v in namespace.items()
if isinstance(v, Column)}
cls = super().__new__(mcs, name, bases, namespace)
cls._columns = columns
# Generate query methods, relationship loaders, etc.
return cls
The metaclass intercepts class creation and augments it with ORM functionality. No source code is written — everything happens through object manipulation.
Pydantic v2: Rust-Generated Validators
Pydantic v2 takes a hybrid approach: it uses Rust (via pydantic-core) to generate optimized validation functions at model creation time. The Python class definition is analyzed, and a Rust-based schema compiler produces a validation plan that runs at near-native speed.
AST-Based Generation in Depth
For cases where you need programmatic code construction with guaranteed syntactic correctness, AST building is the right tool.
Building Complex Functions
import ast
import types
def generate_validator(field_name, field_type, constraints):
"""Generate a validation function for a single field."""
checks = []
if 'min' in constraints:
checks.append(ast.If(
test=ast.Compare(
left=ast.Name(id='value', ctx=ast.Load()),
ops=[ast.Lt()],
comparators=[ast.Constant(value=constraints['min'])]
),
body=[ast.Raise(exc=ast.Call(
func=ast.Name(id='ValueError', ctx=ast.Load()),
args=[ast.Constant(
value=f"{field_name} must be >= {constraints['min']}"
)],
keywords=[]
))],
orelse=[]
))
if 'max_length' in constraints and field_type == str:
checks.append(ast.If(
test=ast.Compare(
left=ast.Call(
func=ast.Name(id='len', ctx=ast.Load()),
args=[ast.Name(id='value', ctx=ast.Load())],
keywords=[]
),
ops=[ast.Gt()],
comparators=[ast.Constant(value=constraints['max_length'])]
),
body=[ast.Raise(exc=ast.Call(
func=ast.Name(id='ValueError', ctx=ast.Load()),
args=[ast.Constant(
value=f"{field_name} exceeds max length {constraints['max_length']}"
)],
keywords=[]
))],
orelse=[]
))
checks.append(ast.Return(
value=ast.Name(id='value', ctx=ast.Load())
))
func = ast.FunctionDef(
name=f'validate_{field_name}',
args=ast.arguments(
posonlyargs=[], args=[ast.arg(arg='value')],
vararg=None, kwonlyargs=[], kw_defaults=[],
kwarg=None, defaults=[]
),
body=checks or [ast.Pass()],
decorator_list=[], returns=None,
lineno=1, col_offset=0
)
module = ast.Module(body=[func], type_ignores=[])
ast.fix_missing_locations(module)
code = compile(module, f'<validator:{field_name}>', 'exec')
ns = {}
exec(code, ns)
return ns[f'validate_{field_name}']
# Usage
validate_age = generate_validator('age', int, {'min': 0})
validate_name = generate_validator('name', str, {'max_length': 100})
Preserving Source Information
Generated code often produces unhelpful stack traces. You can improve this by setting meaningful co_filename and line numbers:
# When compiling, use a descriptive filename
code = compile(tree, 'generated://UserModel/validate_age', 'exec')
# Stack traces will show:
# File "generated://UserModel/validate_age", line 3
For ast.unparse()-based generation, write the source to a temporary file and compile from there — this gives debuggers and coverage tools real source to display.
Import Hooks for Transparent Generation
Python’s import system supports hooks that can intercept module loading and inject generated code:
import importlib.abc
import importlib.machinery
import sys
class SchemaLoader(importlib.abc.Loader):
def __init__(self, schema_path):
self.schema_path = schema_path
def create_module(self, spec):
return None # use default module creation
def exec_module(self, module):
schema = load_schema(self.schema_path)
for model_name, fields in schema.items():
cls = generate_model_class(model_name, fields)
setattr(module, model_name, cls)
class SchemaFinder(importlib.abc.MetaPathFinder):
def find_spec(self, fullname, path, target=None):
if fullname.startswith('generated_models.'):
schema_name = fullname.split('.')[-1]
schema_path = f'schemas/{schema_name}.yaml'
if os.path.exists(schema_path):
return importlib.machinery.ModuleSpec(
fullname,
SchemaLoader(schema_path)
)
return None
sys.meta_path.insert(0, SchemaFinder())
# Now you can do:
# from generated_models.users import User
This pattern makes generated code feel like regular imports while keeping the generation logic centralized.
The exec() Compilation Pattern
When frameworks use exec() for code generation, they follow a specific safe pattern:
def _compile_method(name, source, globals_dict=None):
"""Compile a method from trusted source with proper metadata."""
# 1. Source is NEVER from user input
# 2. Compile first to catch syntax errors
code = compile(source, f'<generated:{name}>', 'exec')
# 3. Execute in an isolated namespace
local_ns = {}
exec(code, globals_dict or {}, local_ns)
# 4. Extract and return the function
func = local_ns[name]
# 5. Optionally set qualname for better debugging
func.__qualname__ = f'Generated.{name}'
return func
The critical safety property: the source string is constructed entirely by the framework from trusted data (class definitions, annotations, schema files). User-provided strings never reach exec().
Template Engines for Code Generation
Jinja2 is the most common template engine for code generation. Best practices for using it:
from jinja2 import Environment, FileSystemLoader
env = Environment(
loader=FileSystemLoader('templates/'),
trim_blocks=True,
lstrip_blocks=True,
keep_trailing_newline=True,
undefined=StrictUndefined, # fail fast on missing variables
)
template = env.get_template('model.py.j2')
source = template.render(
class_name='User',
fields=[
{'name': 'id', 'type': 'int', 'required': True},
{'name': 'email', 'type': 'str', 'required': True},
{'name': 'bio', 'type': 'str', 'required': False},
]
)
# Validate the generated code compiles
compile(source, '<generated>', 'exec')
# Write to file
Path('src/models/user.py').write_text(source)
Key practices:
- Use
StrictUndefinedto catch template variable errors - Compile the output to verify syntax before writing
- Run formatters (Black, isort) on generated code
- Include a “DO NOT EDIT — generated by X” header
Testing Generated Code
Generated code needs testing at two levels:
Generator tests verify the generator produces correct output for known inputs:
def test_model_generator():
source = generate_model("User", [("name", str), ("age", int)])
# Verify it compiles
code = compile(source, "<test>", "exec")
# Verify it runs
ns = {}
exec(code, ns)
user = ns["User"](name="Alice", age=30)
assert user.name == "Alice"
Snapshot tests catch unintended changes to generated output:
def test_model_snapshot(snapshot):
source = generate_model("User", [("name", str), ("age", int)])
assert source == snapshot # compare against stored snapshot
Integration tests verify the generated code works correctly in the full application context — imports resolve, types are correct, serialization works.
Performance Considerations
Code generation adds startup cost but can improve runtime performance:
- Specialized methods generated for specific types avoid runtime type checking
- Inlined logic avoids function call overhead for simple operations
- Pre-computed dispatch tables replace dynamic lookups
However, excessive code generation can slow down imports (many exec() calls during module loading) and increase memory usage (many code objects). Profile before assuming generation is faster.
Anti-Patterns to Avoid
Generating code from user input — This is eval() with extra steps. Always generate from trusted schemas or definitions.
Generating code that generates code — Meta-meta-programming is almost always a sign that you need a better abstraction, not another layer of generation.
Generating unformatted code — Always run generated code through a formatter. Future developers will need to read it for debugging.
Not documenting the generation pipeline — If files are generated, document which tool generates them, from what source, and how to regenerate.
One thing to remember: Code generation in Python ranges from simple string templates to sophisticated AST construction and import hooks. The best frameworks use it to eliminate boilerplate while keeping the generated code inspectable and debuggable. The key safety principle is absolute: generate from trusted definitions only, never from user input. And always make it easy for the next developer to understand what was generated, why, and how to regenerate it.
See Also
- Python Source To Source Transformers Programs that rewrite your Python code for you — like a spelling checker that also fixes your grammar and updates old words.
- Ci Cd Why big apps can ship updates every day without turning your phone into a glitchy mess — CI/CD is the behind-the-scenes quality gate and delivery truck.
- Containerization Why does software that works on your computer break on everyone else's? Containers fix that — and they're why Netflix can deploy 100 updates a day without the site going down.
- Python 310 New Features Python 3.10 gave programmers a shape-sorting machine, friendlier error messages, and cleaner ways to say 'this or that' in type hints.
- Python 311 New Features Python 3.11 made everything faster, error messages smarter, and let you catch several mistakes at once instead of stopping at the first one.