Home Tutorials Iterators

Iterators

Generators and yield: Sequences That Pay as They Go

Pyford Notes July 1, 2026 8 min read
Key points
  • A generator function uses yield instead of return and produces values one at a time.
  • The function body pauses at each yield and resumes on the next call to next().
  • Memory use stays constant regardless of the sequence length.
  • Generator expressions ((x for x in ...)) give you the same benefit without defining a function.

The memory problem

Imagine reading a 2 GB log file to find lines matching a pattern. A list-based approach loads the entire file into memory before you process a single line:

lines = open("server.log").readlines()   # 2 GB in RAM
matches = [l for l in lines if "ERROR" in l]

On a machine with 4 GB of RAM, this leaves very little room for anything else. The problem is not the processing—it is the eagerness. You asked for a list, so Python built the whole list before returning control to you. Generators flip this: they produce each value only when the consumer asks for it.

How yield works

Any function containing yield is a generator function. Calling it does not execute the body at all—it returns a generator object. Values come out one at a time through next():

def count_up(start, stop):
    n = start
    while n <= stop:
        yield n
        n += 1

gen = count_up(1, 5)
print(next(gen))  # 1
print(next(gen))  # 2
print(next(gen))  # 3

At each yield n, the function pauses. Its entire local state—the value of n, where execution stopped, the call stack frame—is preserved. The next call to next() resumes exactly where it left off. When the function body exits (or hits return), a StopIteration exception is raised, signalling the end of the sequence.

A for loop calls next() automatically and catches StopIteration, so you rarely need to call next() directly:

for n in count_up(1, 1_000_000):
    process(n)   # only one integer in memory at a time

Rewriting the log example as a generator function keeps memory usage at the cost of a single line, not 2 GB:

def error_lines(path):
    with open(path) as fh:
        for line in fh:
            if "ERROR" in line:
                yield line.rstrip()

for line in error_lines("server.log"):
    print(line)

Generator expressions

For simple one-liners, a generator expression uses the same syntax as a list comprehension but with parentheses instead of brackets:

# list comprehension: builds the entire list immediately
squares_list = [n ** 2 for n in range(10_000)]

# generator expression: computes one value per iteration
squares_gen = (n ** 2 for n in range(10_000))

total = sum(squares_gen)   # sum consumes the generator without storing squares

When passing a generator expression as the sole argument to a function, the extra set of parentheses is not needed:

total = sum(n ** 2 for n in range(10_000))

Chaining generators into pipelines

Because generators are lazy, you can compose them into processing pipelines where each stage only runs when the next stage pulls a value:

def read_lines(path):
    with open(path) as fh:
        yield from fh

def only_errors(lines):
    for line in lines:
        if "ERROR" in line:
            yield line.strip()

def parse_timestamp(lines):
    for line in lines:
        ts, _, msg = line.partition(" ")
        yield ts, msg

pipeline = parse_timestamp(only_errors(read_lines("server.log")))
for ts, msg in pipeline:
    print(ts, msg)

The file is read one line at a time. Each line passes through only_errors and parse_timestamp before the next line is even read. Memory stays flat regardless of file size.

Sending values into a generator

Generators can also receive values from their consumer using gen.send(value). This resumes the generator and makes the sent value available as the result of the yield expression. This pattern is used in coroutines and async frameworks, though it is an advanced topic beyond typical data-pipeline use.

Frequently asked questions

Can I iterate a generator twice?

No. Once exhausted, a generator object is done. If you need to iterate the same sequence multiple times, either convert it to a list first or re-create the generator by calling the generator function again.

When should I use a generator vs a list?

Use a generator when you only need to iterate once and the dataset is large or potentially infinite. Use a list when you need random access, multiple passes, or the len() of the result.

What does yield from do?

yield from iterable delegates to another iterable, yielding each of its values in turn. It is shorthand for a loop that yields each item, and also correctly threads .send() and .throw() calls down to the sub-generator.