Iterators & Generators in Python — Deep Dive
Iterators and generators are not just syntax conveniences—they are core performance and architecture tools in Python. If your system touches large datasets, streaming APIs, queues, or log pipelines, understanding lazy iteration deeply can prevent memory blowups and improve composability.
Iterator Protocol Under the Hood
Python iteration is built on two dunder methods:
__iter__()returns an iterator object__next__()returns next item or raisesStopIteration
for loops, comprehensions, sum, max, and most built-ins all consume this protocol.
class Countdown:
def __init__(self, start: int):
self.current = start
def __iter__(self):
return self
def __next__(self):
if self.current < 0:
raise StopIteration
value = self.current
self.current -= 1
return value
for n in Countdown(3):
print(n)
This works, but writing custom iterator classes can become verbose.
Generator Functions: Iterator Classes with Less Boilerplate
Generator functions use yield and automatically implement protocol behavior.
def countdown(start: int):
current = start
while current >= 0:
yield current
current -= 1
Each yield suspends frame execution and preserves local state. Resume continues from the exact previous point.
That suspension model is why generators are great for staged pipelines.
Generator Expressions and Memory Behavior
List comprehension:
squares = [n * n for n in range(1_000_000)]
Generator expression:
squares = (n * n for n in range(1_000_000))
The list allocates all values immediately; generator computes on demand. For many workloads, the generator version drastically lowers memory footprint.
Iterator Exhaustion and Reusability
One common production bug: attempting to reuse consumed iterators.
gen = (x for x in range(3))
print(list(gen)) # [0, 1, 2]
print(list(gen)) # []
If consumers need multiple passes, either:
- regenerate the iterator
- materialize once into a list
- redesign API contract to single-pass semantics
Document this clearly in function names/docs.
yield from for Delegation
yield from simplifies composing generators.
def all_lines(paths):
for path in paths:
with open(path, "r", encoding="utf-8") as f:
yield from f
Without yield from, you’d write an inner loop manually. Delegation keeps pipeline code concise.
Sending Values Into Generators
Generators are coroutines in a limited sense: callers can push values back via .send().
def accumulator():
total = 0
while True:
value = yield total
if value is None:
break
total += value
This pattern is less common in modern async-heavy code but still useful for custom streaming transformations.
Error Handling Inside Generators
Exceptions propagate through generators unless handled internally.
def safe_parse(lines):
for line in lines:
try:
yield int(line.strip())
except ValueError:
continue
This allows resilient stream processing where bad records are skipped rather than crashing entire jobs.
Backpressure and Streaming Pipelines
A major advantage of lazy iterators: natural backpressure. Consumers pull values at their pace, so producers don’t need to generate everything immediately.
Pipeline example:
def read_lines(path):
with open(path, "r", encoding="utf-8") as f:
for line in f:
yield line
def non_empty(lines):
for line in lines:
if line.strip():
yield line
def lower(lines):
for line in lines:
yield line.lower()
stream = lower(non_empty(read_lines("events.log")))
for line in stream:
process(line)
Each stage is testable in isolation and memory remains bounded.
Performance Considerations
Good for:
- large sequential processing
- one-pass transformations
- reduced peak memory
Less ideal for:
- heavy random access needs
- frequent repeated passes
- micro-optimizations where function-call overhead dominates
In CPython, generator overhead per yielded item exists. For numeric-heavy workloads, vectorized libraries (NumPy/Pandas) may outperform iterator chains.
Interop with Standard Library Tools
itertools provides high-performance iterator building blocks:
islicefor slicing streamschainfor concatenationteefor duplicating iterator streams (with caveats)groupbyfor adjacent grouping
tee can buffer internally and consume memory if one branch lags far behind another. Use carefully.
Async Generators vs Sync Generators
When data source is asynchronous (network streams, message brokers), async generators are more appropriate:
async def stream_events(client):
async for event in client.events():
yield event
Do not mix sync and async iteration models casually; API boundaries should make execution mode explicit.
Designing Iterator-Friendly APIs
Best practices for library authors:
- Accept any iterable, not only lists.
- Return iterables when streaming is possible.
- Document one-pass behavior.
- Offer explicit materialization helpers where needed.
Example:
def normalize(records):
for r in records:
yield {
"id": r["id"],
"email": r["email"].strip().lower(),
}
Callers can choose list(normalize(...)) or stream directly.
Common Anti-Patterns
- Converting generator to list too early, losing memory advantage.
- Hiding expensive I/O behind innocent-looking iterators without documentation.
- Reusing exhausted iterators unintentionally.
- Side effects in deeply chained generator expressions that are hard to debug.
Debugging Iterator Pipelines
Strategies:
- insert temporary taps (
itertools.islice, logging wrappers) - consume small prefixes during debugging
- unit test each stage with tiny fixtures
- assert output types/contracts explicitly
Streaming bugs are often order/consumption bugs, not algorithm bugs.
One Thing to Remember
Iterators and generators are Python’s streaming backbone: design your data flow around lazy, single-pass pipelines and your programs will handle bigger workloads with less memory and cleaner architecture.
See Also
- Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
- Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
- Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
- Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
- Python Closures See how Python functions can remember private information, even after the outer function has already finished.