Python Regression Testing — Deep Dive
Regression test architecture in Python
A well-structured regression suite in Python uses pytest as the foundation and organizes tests to balance thoroughness with execution speed.
# tests/regression/test_billing_regressions.py
"""
Regression tests for billing module.
Each test documents a specific bug that was fixed.
"""
import pytest
from billing.calculator import calculate_total
class TestBillingRegressions:
"""Bug fixes in the billing calculation engine."""
def test_zero_quantity_items_excluded(self):
"""
Regression: BUG-1234
Zero-quantity items were included in total calculation,
causing invoices to show phantom line items.
Fixed: 2026-01-15
"""
items = [
{"name": "Widget", "price": 10.00, "quantity": 5},
{"name": "Gadget", "price": 20.00, "quantity": 0},
]
total = calculate_total(items)
assert total == 50.00
# Verify zero-quantity item isn't in line items
result = calculate_total(items, include_breakdown=True)
assert len(result["line_items"]) == 1
def test_negative_discount_rejected(self):
"""
Regression: BUG-1301
Negative discounts effectively increased the price,
allowing manipulation via API.
Fixed: 2026-02-03
"""
with pytest.raises(ValueError, match="Discount must be non-negative"):
calculate_total(
[{"name": "Widget", "price": 10.00, "quantity": 1}],
discount=-5.00
)
Documenting the original bug ID, what went wrong, and when it was fixed makes each test a living record. When someone asks “why does this test exist?”, the answer is right there.
Test selection: running only what matters
Full regression suites grow large. Running every test on every commit wastes time when a change only affects one module. Pytest plugins enable intelligent test selection:
# pytest-testmon: only run tests affected by changed code
pip install pytest-testmon
pytest --testmon
# pytest-changed: run tests for changed files only
pip install pytest-changed
pytest --changed-only
pytest-testmon uses coverage data to map which tests exercise which source files. When you change billing/calculator.py, it only runs tests that previously touched that file. This can reduce a 20-minute suite to 30 seconds for targeted changes.
For CI pipelines, a common pattern combines both approaches:
# .github/workflows/test.yml
jobs:
quick-check:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run affected tests
run: pytest --testmon --tb=short
full-regression:
runs-on: ubuntu-latest
if: github.event_name == 'push' && github.ref == 'refs/heads/main'
steps:
- uses: actions/checkout@v4
- name: Full test suite
run: pytest --tb=short -x
Pull requests get fast, targeted testing. Merges to main trigger the full regression suite.
Managing flaky tests
Flaky tests — tests that sometimes pass and sometimes fail without code changes — are the biggest threat to regression testing culture. When developers see random failures, they learn to ignore all failures.
Identify flaky tests systematically:
# conftest.py - Track flaky tests
import json
from pathlib import Path
FLAKY_LOG = Path("tests/.flaky-log.json")
@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_makereport(item, call):
outcome = yield
report = outcome.get_result()
if report.when == "call" and report.failed:
# Log failures for flaky detection
log = json.loads(FLAKY_LOG.read_text()) if FLAKY_LOG.exists() else {}
key = item.nodeid
log.setdefault(key, {"failures": 0, "last_seen": ""})
log[key]["failures"] += 1
log[key]["last_seen"] = str(report.longrepr)[:200]
FLAKY_LOG.write_text(json.dumps(log, indent=2))
Once identified, deal with flaky tests decisively:
- Fix the root cause — often timing dependencies, shared state, or test ordering
- Quarantine — move to a separate directory that runs independently, not blocking CI
- Delete — if the test can’t be made reliable and doesn’t cover critical functionality
# pytest marker for quarantined tests
@pytest.mark.flaky(reruns=3, reason="Database connection timing")
def test_concurrent_writes():
"""Quarantined: intermittent failure under load."""
...
Test data management for regression suites
Regression tests need reproducible data. Three patterns work well:
Factories: Generate test data programmatically using libraries like factory_boy:
import factory
from models import User, Order
class UserFactory(factory.Factory):
class Meta:
model = User
name = factory.Faker("name")
email = factory.Faker("email")
class OrderFactory(factory.Factory):
class Meta:
model = Order
user = factory.SubFactory(UserFactory)
total = factory.Faker("pydecimal", left_digits=3, right_digits=2, positive=True)
Fixtures with snapshots: Capture real production scenarios (anonymized) as test fixtures:
@pytest.fixture
def complex_order_from_prod():
"""Anonymized reproduction of BUG-1567 order structure."""
return json.loads(Path("tests/fixtures/bug-1567-order.json").read_text())
Database seeding: For integration tests, maintain a seed script that creates a known state:
@pytest.fixture(scope="session")
def seeded_db(db_engine):
"""Create known database state for regression suite."""
seed_data = load_seed("tests/seeds/regression_baseline.sql")
with db_engine.begin() as conn:
conn.execute(text(seed_data))
yield db_engine
Measuring regression suite health
Track these metrics in your CI dashboard:
# scripts/regression_metrics.py
"""Generate regression suite health metrics."""
def analyze_suite():
results = parse_junit_xml("test-results.xml")
return {
"total_tests": len(results),
"pass_rate": sum(1 for r in results if r.passed) / len(results),
"avg_duration_sec": sum(r.duration for r in results) / len(results),
"slowest_10": sorted(results, key=lambda r: r.duration)[-10:],
"flaky_candidates": [r for r in results if r.retried],
"coverage_pct": get_coverage_percent(),
}
Key thresholds to monitor:
- Suite duration: Alert if total runtime grows >20% month-over-month
- Flaky rate: More than 2% flaky tests signals a maintenance problem
- Test-to-code ratio: Below 1:1 (tests to source lines) suggests insufficient coverage
Historical regression analysis
Maintain a regression catalog that connects tests to incidents. Over time, this reveals patterns:
## Regression Catalog (excerpt)
| Bug ID | Module | Root Cause | Test File | Date Fixed |
|----------|-------------|---------------------|------------------------------|------------|
| BUG-1234 | billing | Off-by-one in loop | test_billing_regressions.py | 2026-01-15 |
| BUG-1301 | billing | Input validation | test_billing_regressions.py | 2026-02-03 |
| BUG-1445 | auth | Race condition | test_auth_regressions.py | 2026-02-20 |
| BUG-1502 | export | Encoding mismatch | test_export_regressions.py | 2026-03-01 |
Patterns emerge: if billing has the most regressions, it needs more refactoring attention. If race conditions recur, the team needs concurrency training. The regression catalog becomes a diagnostic tool for codebase health.
One thing to remember: Regression testing is a long game. Each test you add after a bug fix is an investment that pays dividends across every future change. The suite’s value grows super-linearly with codebase age.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.