Blog Post #128: Building Your Own Generators with the yield Keyword

In Post #127, we learned about generator expressions, a concise, one-line syntax for creating generators. They are perfect for simple, lazy iteration. But what happens when the logic for generating your sequence is too complex for a single line? What if it requires if statements, helper variables, or other multi-line logic?

For these situations, Python allows us to create generator functions using the powerful yield keyword. In this post, we’ll demystify yield and see how it allows a function to “pause” and “resume” its state.

What is a Generator Function?

A generator function looks like a normal function defined with def, but instead of using return to send back a final value, it uses the yield keyword.

The presence of yield anywhere in a function’s body automatically transforms it into a generator function. When you call a generator function, it doesn’t actually run the code. Instead, it immediately returns a special generator object.

The Magic of yield: Pause and Resume

To understand yield, it’s best to contrast it with return.

  • return: Stops a function completely and sends a single value back to the caller. The function’s local state is destroyed.
  • yield: Pauses the function, sends a value back to the caller, and saves its current state. The next time the caller asks for a value, the function resumes execution right where it left off.

Think of it like a movie. A return statement is like stopping the movie and ejecting the disc. A yield statement is like hitting the pause button; it gives you the current frame, and you can press “play” again to continue right from that spot.

Building a Generator Function: A Step-by-Step Trace

The best way to understand yield is to see it in action. Let’s build a generator that yields the first few square numbers. We’ll add lots of print statements to trace the execution flow.

def square_generator(limit):
    """A generator function that yields the squares of numbers up to a limit."""
    print("--- Generator Started ---")
    n = 0
    while n < limit:
        print(f"--> About to yield {n ** 2}")
        yield n ** 2
        n += 1
        print("--> Resumed after yield")
    print("--- Generator Finished ---")

# 1. Create the generator object. Note: NO code inside the function has run yet!
my_gen = square_generator(3)
print("Generator object created. Now, let's start the loop...\n")

# 2. The for loop will ask for values one by one
for num in my_gen:
    print(f"Loop received: {num}\n")

Let’s trace the output of this code, which is the key to understanding the process:

Generator object created. Now, let's start the loop...

--- Generator Started ---
--> About to yield 0
Loop received: 0

--> Resumed after yield
--> About to yield 1
Loop received: 1

--> Resumed after yield
--> About to yield 4
Loop received: 4

--> Resumed after yield
--- Generator Finished ---

Here is exactly what happened:

  1. my_gen = square_generator(3) is called. The function does not run. It just returns a generator object, ready to start.
  2. The for loop begins and asks my_gen for its first value.
  3. Execution enters the square_generator function. It prints “Generator Started” and sets n = 0.
  4. It hits yield n ** 2. It pauses the function and sends the value 0 back to the for loop.
  5. The for loop receives 0, assigns it to num, and prints “Loop received: 0”.
  6. The loop begins its next iteration and asks my_gen for its next value.
  7. Execution resumes inside the generator right after the yield statement. It prints “Resumed…”, increments n to 1, and continues the while loop.
  8. It hits yield n ** 2 again. It pauses and sends 1 back to the for loop.
  9. This cycle repeats until n becomes 3, the while condition is false, the function prints “Generator Finished”, and the loop ends because the generator has no more values to yield.

What’s Next?

The yield keyword is the key to creating complex generators in Python. By turning a function into a stateful, pausable machine, yield allows you to create memory-efficient iterators for any sequence, no matter how complex the logic is to generate it.

We’ve now seen two ways to create sequences: eagerly with list comprehensions and lazily with generator expressions. We know that generators are more memory-efficient, but does that come at a cost? How do they compare in terms of speed? In Post #129, we will have a performance showdown between list comprehensions and generator expressions.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment