Python Coverage Measurement — Core Concepts

Why coverage matters

Every Python codebase has code that tests never touch. Coverage measurement quantifies this gap. It gives teams a concrete metric: what percentage of your codebase has been exercised by tests?

This matters because untested code is a liability. It might work today, but when someone changes it tomorrow, there’s no safety net. Coverage data helps teams prioritize where to write tests next.

Types of coverage

Line coverage is the simplest form. It tracks which lines of source code executed during your test run. If a function has 10 lines and tests exercised 8, that function has 80% line coverage.

Branch coverage goes deeper. When code has an if/else, line coverage might show both branches as “covered” if a single test hits one line in each block. Branch coverage tracks whether every possible path through a decision point was taken. A function with if x > 0 has two branches — the true path and the false path — and both must be exercised.

Path coverage considers every combination of branches through a function. In practice, the number of paths grows exponentially, so most teams stick with branch coverage as a reasonable middle ground.

How coverage.py works

The standard tool for Python coverage is coverage.py. It uses Python’s tracing hooks — specifically sys.settrace() — to monitor which lines execute during a program run.

When you run coverage run -m pytest, the tool hooks into the interpreter and records every line that runs. After tests finish, coverage report reads that data and compares it against your source files to calculate percentages.

The HTML report (coverage html) is especially useful. It renders your source code with color-coded lines: green for covered, red for missed. Scanning a red-highlighted file instantly shows where tests need attention.

Common misconceptions

“100% coverage means no bugs.” Coverage only proves code was executed, not that it was verified. A test that calls a function but never checks its return value adds coverage without adding confidence.

“Low coverage means bad tests.” Some code is genuinely hard to test — error handlers for rare conditions, platform-specific branches, or third-party integration points. A well-tested codebase might reasonably sit at 85% with all critical paths covered.

“Coverage should always increase.” Adding a new feature with complex logic might temporarily drop overall coverage. What matters is that new code comes with appropriate tests, not that a percentage never decreases.

Practical guidelines

Most teams aim for 80-90% coverage as a threshold. Below 70% suggests significant blind spots. Above 95% often means someone is writing tests just to hit a number rather than to catch bugs.

Focus coverage efforts on business logic, not boilerplate. Testing every getter and setter adds percentage points without meaningful safety. Testing the function that calculates pricing, processes payments, or handles user authentication adds real value.

Set coverage as a CI gate — not as an absolute requirement, but as a “don’t drop below X%” check. This prevents regressions where someone adds untested code without realizing it.

One thing to remember: Coverage is a floor indicator, not a ceiling. It tells you the minimum amount of code that has been exercised — the quality of that exercise depends entirely on the assertions in your tests.

pythontestingquality

See Also