Python Iterators — Deep Dive

Iterators are one of Python’s strongest abstraction layers: a common contract for finite containers, infinite streams, generators, files, sockets, and custom pipelines.

Once you design APIs around iterables/iterators instead of concrete lists, systems become more composable and memory-efficient.

Protocol Details and Guarantees

An iterator must:

  1. implement __next__()
  2. return itself from __iter__()
  3. raise StopIteration permanently once exhausted

That third point is important: after exhaustion, further next() calls should continue raising StopIteration. Reviving an exhausted iterator breaks caller assumptions.

Iterable Containers vs Iterator Objects

Containers usually produce fresh iterators each time:

nums = [1, 2, 3]
print(list(iter(nums)))  # [1, 2, 3]
print(list(iter(nums)))  # [1, 2, 3] again

Iterator objects are stateful and usually one-shot:

it = iter(nums)
print(list(it))  # [1, 2, 3]
print(list(it))  # []

API design implication: if consumers may iterate multiple times, return an iterable container or a factory function, not an already-consumed iterator.

StopIteration Semantics and Generator Interaction

Inside generators, accidental StopIteration propagation from inner calls can create subtle bugs (PEP 479 adjusted behavior in many cases by converting to RuntimeError).

When writing iterator-based helpers, be explicit about termination boundaries and avoid using StopIteration for non-termination control flow.

Sentinel Iterators

iter(callable, sentinel) creates an iterator by repeatedly calling a function until sentinel value appears.

def read_chunk(f, size=8192):
    return f.read(size)

with open("data.bin", "rb") as f:
    for chunk in iter(lambda: read_chunk(f), b""):
        process(chunk)

This is elegant for chunked I/O loops and reduces boilerplate.

itertools as a Stream Algebra

itertools functions are optimized, lazy combinators. A few high-value patterns:

chain and chain.from_iterable

from itertools import chain

all_rows = chain(rows_day1, rows_day2, rows_day3)

islice for bounded consumption

from itertools import islice
preview = list(islice(big_stream, 10))

tee caveat

from itertools import tee
it1, it2 = tee(source, 2)

tee buffers internally when one branch lags, which can grow memory unexpectedly. In high-throughput pipelines, duplicate from source if possible instead of heavy tee usage.

Writing Robust Custom Iterators

Separate iterable and iterator when repeatability matters

class RangeLike:
    def __init__(self, start, stop):
        self.start = start
        self.stop = stop

    def __iter__(self):
        return RangeLikeIterator(self.start, self.stop)

class RangeLikeIterator:
    def __init__(self, current, stop):
        self.current = current
        self.stop = stop

    def __iter__(self):
        return self

    def __next__(self):
        if self.current >= self.stop:
            raise StopIteration
        v = self.current
        self.current += 1
        return v

This avoids shared mutable cursor bugs when the same object is looped over in nested contexts.

Implement __length_hint__ cautiously

Some iterators provide length hints to help preallocation, but incorrect hints can hurt performance or correctness assumptions in consumers. Prefer omission unless size estimate is reliable.

Performance Considerations

Iterator-driven code often wins on peak memory and startup latency, but might lose in raw throughput versus vectorized approaches.

Tradeoff matrix:

  • Iterator pipeline: low memory, composable, Python-level overhead
  • Materialized list: faster repeated indexing, higher memory
  • Vectorized arrays (NumPy/Pandas): best CPU throughput for numeric batch operations, less flexible for irregular stream logic

Profile on realistic input size and access patterns before choosing architecture.

Concurrency and Iterators

Iterators are usually not thread-safe. Sharing one iterator across threads without synchronization can interleave next() calls unpredictably.

Patterns:

  • keep iterator ownership single-threaded
  • push items into queue.Queue for worker fan-out
  • for async contexts, prefer async iterators (__aiter__, __anext__)

Async Iterator Parallels

Async iterator protocol mirrors sync:

  • __aiter__()
  • __anext__() returning awaitable, raises StopAsyncIteration

Used in streaming APIs, websocket message loops, and async DB cursors.

Understanding sync iterators first makes async iteration much easier to reason about.

Real-World Iterator-Centric APIs

  • Database cursor fetch iterators
  • SDK pagination objects yielding records lazily
  • Log processors with transform/filter chains
  • Web crawlers yielding discovered URLs incrementally
  • ETL stages that consume/produce streams for constant memory behavior

Defensive Consumption Patterns

When consuming external iterators, guard against infinite streams and malformed producers. Use bounded utilities like islice, explicit timeouts in I/O-backed iterators, and counters for safety limits. In production batch systems, this prevents jobs from hanging forever on unexpected upstream behavior.

Contract Documentation

Iterator-returning functions should document four things clearly: whether output is one-pass, whether order is stable, whether exhaustion is final, and whether iteration triggers side effects (network calls, disk reads, mutation). These details are often more important than the nominal element type because they shape how downstream code can safely reuse results.

Observability for Streaming Iteration

In long-running iterator pipelines, emit counters for consumed records, dropped records, and end-of-stream events. These metrics make iterator health visible and help separate true upstream starvation from local parser failures. Streaming correctness is easier to maintain when iteration progress is measurable.

One Thing to Remember

Iterator design is API design: deciding whether data is repeatable, one-pass, finite, or infinite determines correctness, memory use, and how safely your code composes with the rest of Python.

pythoniteratorsitertoolsprotocolperformance

See Also

  • Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
  • Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
  • Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
  • Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
  • Python Closures See how Python functions can remember private information, even after the outer function has already finished.