Python Iterators — Deep Dive
Iterators are one of Python’s strongest abstraction layers: a common contract for finite containers, infinite streams, generators, files, sockets, and custom pipelines.
Once you design APIs around iterables/iterators instead of concrete lists, systems become more composable and memory-efficient.
Protocol Details and Guarantees
An iterator must:
- implement
__next__() - return itself from
__iter__() - raise
StopIterationpermanently once exhausted
That third point is important: after exhaustion, further next() calls should continue raising StopIteration. Reviving an exhausted iterator breaks caller assumptions.
Iterable Containers vs Iterator Objects
Containers usually produce fresh iterators each time:
nums = [1, 2, 3]
print(list(iter(nums))) # [1, 2, 3]
print(list(iter(nums))) # [1, 2, 3] again
Iterator objects are stateful and usually one-shot:
it = iter(nums)
print(list(it)) # [1, 2, 3]
print(list(it)) # []
API design implication: if consumers may iterate multiple times, return an iterable container or a factory function, not an already-consumed iterator.
StopIteration Semantics and Generator Interaction
Inside generators, accidental StopIteration propagation from inner calls can create subtle bugs (PEP 479 adjusted behavior in many cases by converting to RuntimeError).
When writing iterator-based helpers, be explicit about termination boundaries and avoid using StopIteration for non-termination control flow.
Sentinel Iterators
iter(callable, sentinel) creates an iterator by repeatedly calling a function until sentinel value appears.
def read_chunk(f, size=8192):
return f.read(size)
with open("data.bin", "rb") as f:
for chunk in iter(lambda: read_chunk(f), b""):
process(chunk)
This is elegant for chunked I/O loops and reduces boilerplate.
itertools as a Stream Algebra
itertools functions are optimized, lazy combinators. A few high-value patterns:
chain and chain.from_iterable
from itertools import chain
all_rows = chain(rows_day1, rows_day2, rows_day3)
islice for bounded consumption
from itertools import islice
preview = list(islice(big_stream, 10))
tee caveat
from itertools import tee
it1, it2 = tee(source, 2)
tee buffers internally when one branch lags, which can grow memory unexpectedly. In high-throughput pipelines, duplicate from source if possible instead of heavy tee usage.
Writing Robust Custom Iterators
Separate iterable and iterator when repeatability matters
class RangeLike:
def __init__(self, start, stop):
self.start = start
self.stop = stop
def __iter__(self):
return RangeLikeIterator(self.start, self.stop)
class RangeLikeIterator:
def __init__(self, current, stop):
self.current = current
self.stop = stop
def __iter__(self):
return self
def __next__(self):
if self.current >= self.stop:
raise StopIteration
v = self.current
self.current += 1
return v
This avoids shared mutable cursor bugs when the same object is looped over in nested contexts.
Implement __length_hint__ cautiously
Some iterators provide length hints to help preallocation, but incorrect hints can hurt performance or correctness assumptions in consumers. Prefer omission unless size estimate is reliable.
Performance Considerations
Iterator-driven code often wins on peak memory and startup latency, but might lose in raw throughput versus vectorized approaches.
Tradeoff matrix:
- Iterator pipeline: low memory, composable, Python-level overhead
- Materialized list: faster repeated indexing, higher memory
- Vectorized arrays (NumPy/Pandas): best CPU throughput for numeric batch operations, less flexible for irregular stream logic
Profile on realistic input size and access patterns before choosing architecture.
Concurrency and Iterators
Iterators are usually not thread-safe. Sharing one iterator across threads without synchronization can interleave next() calls unpredictably.
Patterns:
- keep iterator ownership single-threaded
- push items into
queue.Queuefor worker fan-out - for async contexts, prefer async iterators (
__aiter__,__anext__)
Async Iterator Parallels
Async iterator protocol mirrors sync:
__aiter__()__anext__()returning awaitable, raisesStopAsyncIteration
Used in streaming APIs, websocket message loops, and async DB cursors.
Understanding sync iterators first makes async iteration much easier to reason about.
Real-World Iterator-Centric APIs
- Database cursor fetch iterators
- SDK pagination objects yielding records lazily
- Log processors with transform/filter chains
- Web crawlers yielding discovered URLs incrementally
- ETL stages that consume/produce streams for constant memory behavior
Defensive Consumption Patterns
When consuming external iterators, guard against infinite streams and malformed producers. Use bounded utilities like islice, explicit timeouts in I/O-backed iterators, and counters for safety limits. In production batch systems, this prevents jobs from hanging forever on unexpected upstream behavior.
Contract Documentation
Iterator-returning functions should document four things clearly: whether output is one-pass, whether order is stable, whether exhaustion is final, and whether iteration triggers side effects (network calls, disk reads, mutation). These details are often more important than the nominal element type because they shape how downstream code can safely reuse results.
Observability for Streaming Iteration
In long-running iterator pipelines, emit counters for consumed records, dropped records, and end-of-stream events. These metrics make iterator health visible and help separate true upstream starvation from local parser failures. Streaming correctness is easier to maintain when iteration progress is measurable.
One Thing to Remember
Iterator design is API design: deciding whether data is repeatable, one-pass, finite, or infinite determines correctness, memory use, and how safely your code composes with the rest of Python.
See Also
- Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
- Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
- Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
- Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
- Python Closures See how Python functions can remember private information, even after the outer function has already finished.