Database Migrations with Alembic — Deep Dive
Setting up Alembic properly
Project initialization
pip install alembic
alembic init migrations
This creates a migrations/ directory with env.py, script.py.mako (template), and a versions/ folder for migration files.
Configuring env.py for autogenerate
The default env.py needs modification to point at your models:
# migrations/env.py
from myapp.models import Base # your SQLAlchemy declarative base
target_metadata = Base.metadata
def run_migrations_online():
connectable = engine_from_config(
config.get_section(config.config_ini_section),
prefix="sqlalchemy.",
)
with connectable.connect() as connection:
context.configure(
connection=connection,
target_metadata=target_metadata,
compare_type=True, # detect column type changes
compare_server_default=True, # detect default value changes
include_schemas=True, # handle multiple schemas
)
with context.begin_transaction():
context.run_migrations()
compare_type=True is critical — without it, Alembic ignores changes like String(50) → String(100).
Customizing autogenerate behavior
Autogenerate can be noisy or miss things. Control what it sees:
def include_object(object, name, type_, reflected, compare_to):
# Skip tables managed by other systems
if type_ == "table" and name.startswith("legacy_"):
return False
# Skip all views
if type_ == "table" and object.info.get("is_view", False):
return False
return True
context.configure(
connection=connection,
target_metadata=target_metadata,
include_object=include_object,
render_as_batch=True, # required for SQLite ALTER TABLE support
)
Writing migrations
Auto-generated migration
alembic revision --autogenerate -m "add phone_number to users"
This produces:
"""add phone_number to users
Revision ID: a3f8b7c2d1e4
Revises: b5d9e1f3a2c6
Create Date: 2026-03-28 10:15:32.456789
"""
from alembic import op
import sqlalchemy as sa
revision = 'a3f8b7c2d1e4'
down_revision = 'b5d9e1f3a2c6'
def upgrade():
op.add_column('users', sa.Column('phone_number', sa.String(20), nullable=True))
def downgrade():
op.drop_column('users', 'phone_number')
Always review autogenerated migrations. Common issues to watch for:
- Renamed columns appear as drop + add (data loss)
- Enum type changes may need manual handling
- Index names might conflict across databases
Data migrations
Schema changes often require data transformations. Alembic handles this through the op.execute() method or by using SQLAlchemy Core:
from sqlalchemy import table, column, String
def upgrade():
# Add new column
op.add_column('users', sa.Column('full_name', sa.String(200)))
# Backfill data using raw SQL for performance
op.execute(
"UPDATE users SET full_name = first_name || ' ' || last_name"
)
# Make non-nullable after backfill
op.alter_column('users', 'full_name', nullable=False)
def downgrade():
op.drop_column('users', 'full_name')
Never import your application models in migration files. Models change over time, but migration files are frozen in time. Use sa.table() and sa.column() for data migrations:
# Wrong — model may change after this migration was written
from myapp.models import User
# Correct — self-contained schema snapshot
users = table('users',
column('id', sa.Integer),
column('email', sa.String),
column('email_domain', sa.String),
)
def upgrade():
op.add_column('users', sa.Column('email_domain', sa.String(100)))
connection = op.get_bind()
connection.execute(
users.update().values(
email_domain=sa.func.split_part(users.c.email, '@', 2)
)
)
Zero-downtime migrations
Production databases can’t go offline for migrations. The key constraint: old application code continues running while migrations execute, and new code must work with both old and new schema states.
The expand-contract pattern
Instead of modifying columns in place, use a multi-step approach:
Step 1 — Expand (add new structure):
def upgrade():
op.add_column('orders', sa.Column('status_v2', sa.String(20)))
Step 2 — Migrate data (backfill):
def upgrade():
op.execute("""
UPDATE orders SET status_v2 = CASE
WHEN status = 0 THEN 'pending'
WHEN status = 1 THEN 'shipped'
WHEN status = 2 THEN 'delivered'
END
""")
Step 3 — Deploy code that reads from both, writes to both
Step 4 — Contract (remove old structure):
def upgrade():
op.drop_column('orders', 'status')
op.alter_column('orders', 'status_v2', new_column_name='status')
Each step is a separate migration and a separate deployment. This pattern takes longer but avoids downtime and data loss.
Dangerous operations to avoid in production
| Operation | Risk | Safe Alternative |
|---|---|---|
DROP COLUMN | Running code may reference it | Deploy code removal first, then drop |
ALTER COLUMN NOT NULL | Existing NULLs cause failure | Backfill first, then add constraint |
CREATE INDEX | Locks table for duration | Use CREATE INDEX CONCURRENTLY |
RENAME COLUMN | Old code references old name | Expand-contract pattern |
For concurrent index creation in Alembic:
from alembic import op
def upgrade():
op.execute(
"CREATE INDEX CONCURRENTLY idx_users_email ON users (email)"
)
# Must disable transaction for CONCURRENTLY
def downgrade():
op.drop_index("idx_users_email", table_name="users")
And in env.py, detect migrations that need non-transactional execution or use autocommit:
# In the migration file itself
from alembic import context
# Flag this migration as non-transactional
non_transactional = True
Branch management
Handling migration conflicts
When two developers branch from the same revision:
$ alembic heads
a3f8b7c2d1e4 (head)
c7d2e5f8a1b3 (head)
$ alembic merge -m "merge user and order migrations" a3f8b7c2d1e4 c7d2e5f8a1b3
This creates a merge point. The merge migration is usually empty — it just records that both branches have been applied.
Prevention strategy
Keep migration conflicts rare:
- Generate migrations just before merging to main
- In CI, check for multiple heads:
alembic heads | wc -lshould return 1 - Rebase migrations during code review
Testing migrations
Test both directions
import pytest
from alembic.command import upgrade, downgrade
from alembic.config import Config
@pytest.fixture
def alembic_config():
config = Config("alembic.ini")
return config
def test_migrations_up_and_down(alembic_config):
# Apply all migrations
upgrade(alembic_config, "head")
# Roll back all migrations
downgrade(alembic_config, "base")
# Apply again to verify clean cycle
upgrade(alembic_config, "head")
CI integration
Add to your CI pipeline:
# .github/workflows/test.yml
- name: Check for multiple heads
run: |
heads=$(alembic heads | wc -l)
if [ "$heads" -gt 1 ]; then
echo "Multiple migration heads detected — merge required"
exit 1
fi
- name: Test migrations
run: |
alembic upgrade head
alembic downgrade base
alembic upgrade head
Alembic with async SQLAlchemy
For applications using asyncio and AsyncEngine:
# migrations/env.py
from sqlalchemy.ext.asyncio import create_async_engine
import asyncio
def run_async_migrations():
connectable = create_async_engine(
config.get_main_option("sqlalchemy.url"),
)
async def do_migrations():
async with connectable.connect() as connection:
await connection.run_sync(do_run_migrations)
await connectable.dispose()
asyncio.run(do_migrations())
def do_run_migrations(connection):
context.configure(connection=connection, target_metadata=target_metadata)
with context.begin_transaction():
context.run_migrations()
Production workflow summary
- Develop schema changes in models
- Run
alembic revision --autogenerate -m "description" - Review and edit the generated migration
- Test locally:
upgrade head,downgrade -1,upgrade head - Code review the migration file alongside model changes
- CI runs migration tests against a real database
- Deploy: migration runs as a pre-deploy step
- Monitor: check
alembic_versiontable matches expected revision
One thing to remember: Migration files are immutable historical records. Once a migration has been applied to any shared environment, never modify it — create a new migration to fix issues. Editing applied migrations causes divergent database states that are extremely difficult to reconcile.
See Also
- Python Aioredis Understand Aioredis through a practical analogy so your Python decisions become faster and clearer.
- Python Alembic Understand Alembic through a practical analogy so your Python decisions become faster and clearer.
- Python Asyncpg Database asyncpg is the fastest way for Python to talk to PostgreSQL without making your program sit around waiting.
- Python Asyncpg Understand Asyncpg through a practical analogy so your Python decisions become faster and clearer.
- Python Cassandra Python Understand Cassandra Python through a practical analogy so your Python decisions become faster and clearer.