Python Startup Optimization — Deep Dive
Anatomy of CPython startup
Before any user code runs, CPython performs approximately 30 distinct initialization steps. The most time-consuming are:
# Measure each phase precisely
import sys
import time
# Phase 1: Interpreter core (measured externally)
# - Initialize memory allocator
# - Create type objects (int, str, list, dict, etc.)
# - Initialize the GIL
# - Set up the main thread state
# Time: ~5-8ms on modern hardware
# Phase 2: Import system bootstrap
# - Initialize importlib._bootstrap (frozen)
# - Initialize importlib._bootstrap_external (frozen in 3.11+)
# Time: ~3-5ms
# Phase 3: site module
# - Compute sys.prefix, sys.path
# - Process .pth files
# - Import sitecustomize.py, usercustomize.py
# Time: ~5-15ms
# Phase 4: User imports
# - Import your script and its dependencies
# Time: variable, often 50-500ms+
Advanced import profiling
Building an import profiler
import sys
import time
from collections import defaultdict
class ImportProfiler:
def __init__(self):
self.timings = {}
self.stack = []
self._original_import = __builtins__.__import__
def _profiled_import(self, name, *args, **kwargs):
if name in sys.modules:
return self._original_import(name, *args, **kwargs)
start = time.perf_counter_ns()
self.stack.append(name)
try:
module = self._original_import(name, *args, **kwargs)
return module
finally:
elapsed_ns = time.perf_counter_ns() - start
self.stack.pop()
self.timings[name] = elapsed_ns
def start(self):
__builtins__.__import__ = self._profiled_import
def stop(self):
__builtins__.__import__ = self._original_import
def report(self, top_n=20):
sorted_timings = sorted(
self.timings.items(),
key=lambda x: x[1],
reverse=True
)
print(f"\n{'Module':<50} {'Time (ms)':>10}")
print("-" * 62)
for name, ns in sorted_timings[:top_n]:
ms = ns / 1_000_000
print(f"{name:<50} {ms:>10.2f}")
# Usage
profiler = ImportProfiler()
profiler.start()
import your_application
profiler.stop()
profiler.report()
Using importtime output programmatically
# Generate machine-readable import timing data
python -X importtime -c "import flask" 2>&1 | \
python -c "
import sys, re
for line in sys.stdin:
m = re.match(r'import time:\s+(\d+)\s+\|\s+(\d+)\s+\|\s+(.*)', line)
if m:
self_us, cum_us, name = int(m[1]), int(m[2]), m[3].strip()
if cum_us > 5000: # Only show imports > 5ms
print(f'{cum_us/1000:8.1f}ms {name}')
" | sort -rn
Lazy import architectures
Pattern 1: Module-level lazy imports with getattr
Python 3.7+ supports module-level __getattr__, enabling clean lazy imports:
# mypackage/__init__.py
_LAZY_IMPORTS = {
"heavy_module": "mypackage._heavy",
"another": "mypackage._another",
}
def __getattr__(name):
if name in _LAZY_IMPORTS:
import importlib
module = importlib.import_module(_LAZY_IMPORTS[name])
globals()[name] = module # Cache for subsequent access
return module
raise AttributeError(f"module 'mypackage' has no attribute {name}")
def __dir__():
return list(globals().keys()) + list(_LAZY_IMPORTS.keys())
Pattern 2: importlib.util.LazyLoader
import importlib.util
def lazy_import(name):
spec = importlib.util.find_spec(name)
if spec is None:
raise ImportError(f"No module named '{name}'")
loader = importlib.util.LazyLoader(spec.loader)
spec.loader = loader
module = importlib.util.module_from_spec(spec)
sys.modules[name] = module
loader.exec_module(module)
return module
# The module object exists immediately but code runs on first attribute access
np = lazy_import("numpy")
# numpy is NOT loaded yet
result = np.array([1, 2, 3]) # NOW numpy loads
Pattern 3: Deferred import for CLI subcommands
For CLI tools where each subcommand needs different dependencies:
import sys
def main():
if len(sys.argv) < 2:
print("Usage: tool <command>")
return
command = sys.argv[1]
if command == "analyze":
# Only import pandas when the analyze command is used
from myapp.analyze import run_analysis
run_analysis(sys.argv[2:])
elif command == "serve":
# Only import flask when serve command is used
from myapp.server import start_server
start_server(sys.argv[2:])
elif command == "version":
# No heavy imports needed
print("1.0.0")
if __name__ == "__main__":
main()
Frozen modules
CPython can “freeze” modules by compiling them into the interpreter binary. Since Python 3.11, many standard library modules are frozen by default, eliminating filesystem reads:
import _imp
# Check if a module is frozen
print(_imp.is_frozen("os")) # False (not frozen)
print(_imp.is_frozen("_frozen_importlib")) # True
# List frozen modules (implementation-specific)
import sys
frozen_names = [name for name in sys.modules
if getattr(sys.modules[name], '__spec__', None)
and getattr(sys.modules[name].__spec__, 'origin', '') == 'frozen']
For custom deployments, you can freeze your own modules using tools like cx_Freeze, PyInstaller, or Nuitka.
Serverless-specific optimizations
AWS Lambda and similar platforms charge for cold start time. Specific strategies:
# 1. Import at module level only what's needed for initialization
import json # Always needed, lightweight
import os # Always needed, lightweight
# 2. Defer heavy imports to handler function
def lambda_handler(event, context):
# These only load on first invocation, then cached
import boto3
import pandas as pd
# Process event...
# 3. Use Lambda layers for pre-installed dependencies
# Dependencies in layers are pre-extracted, saving import time
# 4. Use provisioned concurrency to eliminate cold starts entirely
Measuring Lambda cold starts
import time
import os
_COLD_START = True
_INIT_TIME = time.perf_counter()
def handler(event, context):
global _COLD_START
if _COLD_START:
startup_ms = (time.perf_counter() - _INIT_TIME) * 1000
print(f"Cold start: {startup_ms:.0f}ms")
_COLD_START = False
start = time.perf_counter()
# ... handler logic ...
execution_ms = (time.perf_counter() - start) * 1000
print(f"Execution: {execution_ms:.0f}ms")
Compile-time optimizations
Pre-compilation strategies
# Compile all .py files in a directory tree
python -m compileall -b /app/ # -b puts .pyc next to .py
# Compile with optimization level 1 (remove asserts)
python -O -m compileall /app/
# Compile with optimization level 2 (also remove docstrings)
python -OO -m compileall /app/
# In Dockerfile
FROM python:3.11-slim
COPY . /app
RUN python -m compileall -q /app
Source-less deployment
You can deploy only .pyc files without .py source:
python -m compileall -b /app/
find /app -name "*.py" -delete # Remove source files
# .pyc files in __pycache__/ still work
This slightly reduces filesystem overhead and prevents source modification.
Reducing sys.path length
Each entry in sys.path requires a filesystem lookup during import. Long paths slow down imports that fail (the interpreter checks every path entry before raising ImportError):
import sys
# Audit sys.path for unnecessary entries
for i, p in enumerate(sys.path):
import os
exists = os.path.isdir(p)
print(f" [{i}] {'✓' if exists else '✗'} {p}")
# Remove non-existent entries
sys.path = [p for p in sys.path if os.path.isdir(p)]
Benchmark comparison
Real-world startup times for common configurations:
| Configuration | Time | Notes |
|---|---|---|
python -S -c "pass" | ~8ms | Minimal interpreter |
python -c "pass" | ~20ms | With site module |
python -c "import json" | ~22ms | Lightweight stdlib |
python -c "import flask" | ~120ms | Medium framework |
python -c "import django" | ~250ms | Heavy framework |
python -c "import pandas" | ~350ms | Data library |
python -c "import torch" | ~800ms | ML framework |
These numbers are from CPython 3.11 on an AMD Ryzen 7 with NVMe SSD. Python 3.12 and 3.13 show 5–15% improvements through additional frozen modules and import system optimizations.
The zipapp approach
For distributing CLI tools with fast startup, zipapp bundles everything into a single file:
# Create a zip application
python -m zipapp myapp/ -p "/usr/bin/env python3" -o myapp.pyz
# The .pyz file is a self-contained executable
./myapp.pyz
Zip imports can be faster than directory imports because a single file read replaces multiple filesystem lookups. However, C extensions cannot be loaded from zip files.
The one thing to remember: Systematic startup optimization follows a clear priority: profile with -X importtime, defer heavy imports with lazy loading patterns, reduce the dependency chain, pre-compile .pyc files for deployment, and for the most latency-sensitive cases consider frozen modules or zipapp packaging — always measure before and after each change.
See Also
- Python Ast Module Code Analysis How Python's ast module reads your code like a grammar teacher diagrams sentences — turning source text into a tree you can inspect and change.
- Python Dis Module Bytecode How Python's dis module lets you peek at the secret instructions your computer actually runs when it executes your Python code.
- Python Gc Module Internals How Python's garbage collector automatically cleans up memory you are no longer using — like a tidy roommate for your program.
- Python Importlib Custom Loaders How Python's importlib lets you teach Python to load code from anywhere — databases, zip files, the internet, or even generated on the fly.
- Python Site Customization How Python's site module sets up your environment before your code even starts running — the invisible first step of every Python program.