Python Generators — Core Concepts
Generators are iterators created in a very ergonomic way. They let you produce a sequence lazily: values appear one at a time as needed, not all at once.
That single behavior changes both performance and architecture.
Why Generators Exist
A list comprehension computes all results immediately:
squares = [x * x for x in range(1_000_000)]
A generator expression defers computation:
squares = (x * x for x in range(1_000_000))
The generator version can start producing useful output instantly and uses far less memory.
Generator Function Basics
Any function containing yield becomes a generator function.
def read_chunks(items, size):
for i in range(0, len(items), size):
yield items[i:i + size]
Calling read_chunks(...) does not run the body fully. It returns a generator object. Each iteration resumes execution until the next yield.
Lifecycle: Start, Pause, Resume, Finish
Generator execution model:
- Created (not started)
- Runs until first
yield - Pauses and returns value
- Resumes on next request
- Ends with
StopIteration
This pause/resume statefulness makes generators ideal for stream processing.
Generator Expressions vs Functions
Use generator expressions for short inline logic:
total = sum(x * x for x in range(10_000))
Use generator functions when logic has multiple steps, branching, or cleanup.
def valid_emails(rows):
for row in rows:
email = row.get("email", "").strip().lower()
if "@" in email:
yield email
Real Pipeline Example
def read_lines(path):
with open(path, "r", encoding="utf-8") as f:
for line in f:
yield line
def parse_errors(lines):
for line in lines:
if "ERROR" in line:
yield line
def extract_codes(error_lines):
for line in error_lines:
parts = line.split()
if len(parts) > 2:
yield parts[2]
codes = extract_codes(parse_errors(read_lines("app.log")))
for code in codes:
print(code)
Each stage pulls from the previous one lazily. You can process very large logs without building giant intermediate lists.
yield from
When one generator delegates to another, yield from keeps code concise.
def chain(*iterables):
for it in iterables:
yield from it
Equivalent manual loops are more verbose and easier to get wrong.
Common Misconception
Misconception: generators are always faster.
Reality: they are often more memory efficient and can reduce startup time, but per-item overhead can make them slower than vectorized or list-based operations for small datasets. Choose based on workload:
- huge/streaming data → generators shine
- tiny in-memory transforms → list may be simpler and faster enough
One Thing to Remember
Generators trade immediate full results for lazy, incremental processing; that trade unlocks memory-efficient pipelines for real-world data flows.
See Also
- Python Async Await Async/await helps one Python program juggle many waiting jobs at once, like a chef who keeps multiple pots moving without standing still.
- Python Basics Python is the programming language that reads like plain English — here's why millions of beginners (and experts) choose it first.
- Python Booleans Make Booleans click with one clear analogy you can reuse whenever Python feels confusing.
- Python Break Continue Make Break Continue click with one clear analogy you can reuse whenever Python feels confusing.
- Python Closures See how Python functions can remember private information, even after the outer function has already finished.