Test Coverage Strategies — Deep Dive
How coverage.py instruments code
Under the hood, coverage.py uses Python’s sys.settrace (or the faster C tracer) to intercept every line execution. When branch coverage is enabled, it also tracks which arcs (source line → destination line) were taken. This arc data is what differentiates branch coverage from simple line counting.
The instrumentation adds roughly 20–40% overhead to test execution time with the C tracer, and considerably more with the pure-Python tracer. For large test suites, this overhead matters and affects CI pipeline design.
Configuration for real projects
A production-grade pyproject.toml configuration:
[tool.coverage.run]
source = ["src/mypackage"]
branch = true
parallel = true
concurrency = ["thread", "multiprocessing"]
[tool.coverage.report]
fail_under = 85
show_missing = true
exclude_lines = [
"pragma: no cover",
"if TYPE_CHECKING:",
"if __name__ == .__main__.",
"raise NotImplementedError",
"@overload",
]
[tool.coverage.html]
directory = "htmlcov"
Key decisions here:
parallel = true— essential when tests run across multiple processes (pytest-xdist). Each worker writes a separate.coverage.<hostname>.<pid>file. After the run,coverage combinemerges them.concurrency— tells coverage.py to track execution across threads and multiprocessing workers.exclude_lines— removes lines that are structurally untestable (type-checking blocks, overloads, abstract method stubs).
The coverage ratchet in CI
Implementing a ratchet requires storing the current threshold and failing the build if coverage drops:
# scripts/check_coverage_ratchet.py
import json
import sys
from pathlib import Path
RATCHET_FILE = Path(".coverage-ratchet.json")
def main():
current = float(sys.argv[1])
if RATCHET_FILE.exists():
data = json.loads(RATCHET_FILE.read_text())
threshold = data["threshold"]
else:
threshold = 0.0
if current < threshold:
print(f"Coverage dropped: {current:.1f}% < {threshold:.1f}%")
sys.exit(1)
if current > threshold:
RATCHET_FILE.write_text(json.dumps({"threshold": current}))
print(f"Ratchet updated: {threshold:.1f}% → {current:.1f}%")
else:
print(f"Coverage held at {current:.1f}%")
if __name__ == "__main__":
main()
In a GitHub Actions workflow:
- name: Run tests with coverage
run: pytest --cov=src/mypackage --cov-branch --cov-report=json
- name: Check ratchet
run: |
TOTAL=$(python -c "import json; print(json.load(open('coverage.json'))['totals']['percent_covered'])")
python scripts/check_coverage_ratchet.py "$TOTAL"
Diff coverage for incremental enforcement
diff-cover compares coverage data against a git diff to find uncovered new code:
pip install diff-cover
coverage json
diff-cover coverage.json --compare-branch=origin/main --fail-under=90
This approach is powerful for legacy codebases. You don’t have to retroactively cover 50,000 lines of old code — you just ensure every new commit meets the standard. Over months, overall coverage climbs naturally.
Branch coverage analysis
Branch coverage reveals hidden complexity. Consider:
def categorize(value: int, strict: bool = False) -> str:
if value > 100:
if strict:
raise ValueError("Value too high in strict mode")
return "high"
elif value > 50:
return "medium"
else:
return "low"
Line coverage could hit 100% without ever testing the strict=True path when value > 100. Branch coverage requires both strict=True and strict=False when value exceeds 100, revealing the untested ValueError raise.
The coverage report marks partial branches with -> indicators:
6 if value > 100:
7 if strict: # 7->8 (missed), 7->9 (hit)
Combining with mutation testing
Coverage tells you what ran. Mutation testing tells you if your tests would notice a change. The combination is the strongest quality signal available.
pip install mutmut
mutmut run --paths-to-mutate=src/mypackage/core.py
mutmut results
mutmut modifies your code (swaps > with >=, removes return statements, changes constants) and re-runs tests. Surviving mutants indicate weak assertions. A file with 100% coverage but 40% mutation score has tests that execute code without verifying behavior.
Target mutation score for critical paths: 80%+. For utility code: 60%+ is reasonable. Running mutmut is slow (it re-runs tests per mutation), so limit it to changed files in CI:
git diff --name-only origin/main -- 'src/**.py' | \
xargs -I{} mutmut run --paths-to-mutate={}
Coverage for async and concurrent code
Async code needs special handling. coverage.py supports asyncio natively since version 6.0, but there are gotchas:
- Tasks that are cancelled mid-execution appear as partially covered even though the cancellation is expected behavior.
- Background tasks spawned via
asyncio.create_taskmay complete after the test assertion, meaning the coverage data is recorded but the test didn’t verify the outcome.
For multiprocessing, set concurrency = ["multiprocessing"] and call coverage.process_startup() in each worker. With pytest-xdist, the --cov flag handles this automatically.
Context-aware coverage
coverage.py supports contexts, which tag each covered line with the test that hit it:
[tool.coverage.run]
dynamic_context = test_function
This generates a database where you can query: “Which tests cover line 42 of core.py?” Useful for:
- Identifying which tests to run when a file changes (test impact analysis).
- Finding tests that are the sole exerciser of a code path — if that test is deleted, coverage drops.
- Debugging flaky tests by seeing which code paths they uniquely cover.
Practical coverage thresholds
Based on real-world projects:
| Project type | Realistic target | Critical paths |
|---|---|---|
| Web API (FastAPI/Django) | 80–85% | Auth, payments: 95%+ |
| Data pipeline | 75–80% | Transform logic: 90%+ |
| CLI tool | 70–80% | Core commands: 90%+ |
| Library/SDK | 85–95% | Public API: 95%+ |
The key insight: one global number is less useful than per-module targets. Your authentication module deserves different scrutiny than your logging configuration.
Reporting and visualization
For team-level visibility, publish HTML coverage reports as CI artifacts and track trends with tools like Codecov or Coveralls. The trend line — not the absolute number — is what predicts quality trajectory.
Generate a badge for your README:
coverage json
python -c "
import json
data = json.load(open('coverage.json'))
pct = data['totals']['percent_covered_display']
print(f'Coverage: {pct}%')
"
The one thing to remember: The strongest coverage strategy combines branch analysis for finding gaps, a ratchet for preventing regression, diff-cover for incremental enforcement, and mutation testing for verifying assertion quality.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.