Python Coverage Measurement — Core Concepts
Why coverage matters
Every Python codebase has code that tests never touch. Coverage measurement quantifies this gap. It gives teams a concrete metric: what percentage of your codebase has been exercised by tests?
This matters because untested code is a liability. It might work today, but when someone changes it tomorrow, there’s no safety net. Coverage data helps teams prioritize where to write tests next.
Types of coverage
Line coverage is the simplest form. It tracks which lines of source code executed during your test run. If a function has 10 lines and tests exercised 8, that function has 80% line coverage.
Branch coverage goes deeper. When code has an if/else, line coverage might show both branches as “covered” if a single test hits one line in each block. Branch coverage tracks whether every possible path through a decision point was taken. A function with if x > 0 has two branches — the true path and the false path — and both must be exercised.
Path coverage considers every combination of branches through a function. In practice, the number of paths grows exponentially, so most teams stick with branch coverage as a reasonable middle ground.
How coverage.py works
The standard tool for Python coverage is coverage.py. It uses Python’s tracing hooks — specifically sys.settrace() — to monitor which lines execute during a program run.
When you run coverage run -m pytest, the tool hooks into the interpreter and records every line that runs. After tests finish, coverage report reads that data and compares it against your source files to calculate percentages.
The HTML report (coverage html) is especially useful. It renders your source code with color-coded lines: green for covered, red for missed. Scanning a red-highlighted file instantly shows where tests need attention.
Common misconceptions
“100% coverage means no bugs.” Coverage only proves code was executed, not that it was verified. A test that calls a function but never checks its return value adds coverage without adding confidence.
“Low coverage means bad tests.” Some code is genuinely hard to test — error handlers for rare conditions, platform-specific branches, or third-party integration points. A well-tested codebase might reasonably sit at 85% with all critical paths covered.
“Coverage should always increase.” Adding a new feature with complex logic might temporarily drop overall coverage. What matters is that new code comes with appropriate tests, not that a percentage never decreases.
Practical guidelines
Most teams aim for 80-90% coverage as a threshold. Below 70% suggests significant blind spots. Above 95% often means someone is writing tests just to hit a number rather than to catch bugs.
Focus coverage efforts on business logic, not boilerplate. Testing every getter and setter adds percentage points without meaningful safety. Testing the function that calculates pricing, processes payments, or handles user authentication adds real value.
Set coverage as a CI gate — not as an absolute requirement, but as a “don’t drop below X%” check. This prevents regressions where someone adds untested code without realizing it.
One thing to remember: Coverage is a floor indicator, not a ceiling. It tells you the minimum amount of code that has been exercised — the quality of that exercise depends entirely on the assertions in your tests.
See Also
- Python Acceptance Testing Patterns How Python teams verify software does what real users actually asked for.
- Python Approval Testing How approval testing lets you verify complex Python output by comparing it to a saved 'golden' copy you already checked.
- Python Behavior Driven Development Get an intuitive feel for Behavior Driven Development so Python behavior stops feeling unpredictable.
- Python Browser Automation Testing How Python can control a web browser like a robot to test websites automatically.
- Python Chaos Testing Applications Why breaking your own Python systems on purpose makes them stronger.