Python timeit Best Practices — Core Concepts

What timeit actually does

The timeit module measures small code snippets by running them many times in a controlled loop. It disables garbage collection during measurement and uses time.perf_counter() for high-resolution timing.

There are two key parameters:

  • number — how many times to run the code in one batch (default: adaptively chosen or 1,000,000)
  • repeat — how many separate batches to run (default: 5 in timeit.repeat())

The distinction matters. Each “repeat” is an independent measurement. The minimum of the repeats is often the best estimate of true performance because higher values reflect interference from other processes.

The setup parameter

Code that prepares data should go in setup, not in the timed code:

import timeit

# WRONG: import happens inside the timed loop
timeit.timeit('json.dumps({"key": "value"})', number=100000)
# NameError: json is not defined

# RIGHT: setup runs once before timing starts
timeit.timeit(
    'json.dumps({"key": "value"})',
    setup='import json',
    number=100000
)

Setup runs once per repeat, not once per iteration. This is the right behavior — you want to measure the operation, not the imports.

Command-line usage

The CLI interface is surprisingly powerful:

# Basic timing
python -m timeit "sum(range(1000))"

# With setup
python -m timeit -s "import json; data={'a': 1}" "json.dumps(data)"

# Multiple statements (semicolons)
python -m timeit -s "xs = list(range(1000))" "xs.sort()" "xs.reverse()"

# Control repetitions
python -m timeit -n 10000 -r 7 "list(range(100))"

The CLI auto-calibrates number if you don’t specify -n, which is usually what you want.

Five common traps

1. Timing mutable operations

# WRONG: sort mutates the list — after first iteration it's already sorted
timeit.timeit('xs.sort()', setup='xs = list(range(1000))', number=10000)
# This benchmarks sorting a sorted list, not a random one

# RIGHT: recreate the list each time
timeit.timeit('sorted(xs)', setup='import random; xs = random.sample(range(1000), 1000)', number=10000)

2. Ignoring setup cost for meaningful context

If your function needs a database connection or large data structure, put the creation in setup. But be aware that the setup object is shared across all iterations — mutations accumulate.

3. Comparing across separate timeit calls

# Unreliable: system load may differ between calls
time_a = timeit.timeit('method_a(data)', setup=setup, number=N)
time_b = timeit.timeit('method_b(data)', setup=setup, number=N)

Better: use timeit.repeat() for each and compare the minimums, or use a proper benchmarking framework.

4. Number too high for slow code

If your function takes 1 second, number=1000000 means waiting 11.5 days. Use a smaller number:

# For slow functions, reduce iterations
timeit.timeit('slow_function()', setup='...', number=100)

5. Forgetting globals parameter in scripts

def my_function():
    return sum(range(100))

# WRONG: timeit can't see my_function
timeit.timeit('my_function()')

# RIGHT: pass the current namespace
timeit.timeit('my_function()', globals=globals())

Common misconception: timeit is only for micro-benchmarks

While timeit excels at microsecond-level measurements, it works fine for operations taking milliseconds or even seconds. Just adjust number downward. The real limitation is that timeit doesn’t provide statistical analysis — it gives you raw times. For histograms, percentiles, and regression tracking, tools like pyperf or pytest-benchmark build on similar principles with richer output.

When to use timeit vs alternatives

ScenarioTool
Quick comparison of two expressionstimeit CLI
Benchmarks in test suitepytest-benchmark
Full statistical analysispyperf
Production latency trackingtime.perf_counter() in instrumentation
Profiling a whole programcProfile or py-spy

The one thing to remember: always separate setup from measured code, use the minimum of multiple repeats as your estimate, and never time mutable operations without resetting state.

pythonperformancestdlib

See Also