Blog Post #126: Introducing Generators: The Art of Lazy Evaluation

Table of Contents

In Post #125, we discovered the major drawback of lists: when they get very large, they consume a huge amount of memory and can be slow to create. We called this “eager evaluation” because the entire list is built before we can use a single item from it.

The solution to this problem is “lazy evaluation,” and in Python, the primary tool for this is the generator.

In this post, we’ll learn what generators are, how they work, and why they are the key to writing memory-efficient code for large datasets.

What is a Generator?

A generator is a special type of iterator that produces items one at a time and only when you ask for them. Instead of storing all its values in memory at once like a list, it generates each value on-the-fly as you loop over it.

Let’s use an analogy.

A list is like a complete transcript of a long speech, printed out and stored in a giant book. You need a lot of paper (memory) to hold it all.
A generator is like the speaker giving the speech. They only produce the next sentence when it’s time to say it. They don’t have the entire speech written down; they just know the rules for generating the next part.

This “one at a time” process is called lazy evaluation.

You’ve Already Been Using a Generator-like Object!

The concept of lazy evaluation might seem new, but you’ve been using an object that behaves this way all along: the range() function.

When you write my_range = range(10_000_000), Python does not create a list of 10 million integers in your computer’s memory. Instead, it creates a very small, smart range object that simply remembers three things: your start, stop, and step values.

We can prove this by checking the memory usage. The sys.getsizeof() function tells us how many bytes an object takes up in memory.

import sys

# Create a list of 100,000 numbers
list_of_nums = [i for i in range(100000)]
print(f"Size of list: {sys.getsizeof(list_of_nums)} bytes")

# Create a range object for 100,000 numbers
range_of_nums = range(100000)
print(f"Size of range object: {sys.getsizeof(range_of_nums)} bytes")

The output is dramatic (exact numbers may vary slightly):

Size of list: 800984 bytes
Size of range object: 48 bytes

The list takes up hundreds of thousands of bytes, while the range object takes up almost nothing, regardless of how large its range is.

How to Use a Generator

The beauty of generators is that you use them in your code just like you would use a list in a for loop. The for loop automatically handles the process of asking the generator for the “next” item on each iteration.

# Even with a huge range, this code uses very little memory
for i in range(1_000_000):
    # The number 'i' is generated on-the-fly for this one iteration.
    # The full sequence of one million numbers never exists in memory at once.
    if i % 100_000 == 0:
        print(f"Processing number {i}...")

In this loop, the range object generates 0, gives it to the loop, and then effectively forgets it. When the loop asks for the next item, it generates 1, and so on. This is incredibly memory-efficient.

What’s Next?

Generators are the Pythonic solution to working with large sequences of data. By using lazy evaluation to produce items one at a time, they allow you to process massive datasets with a minimal memory footprint.

range() is a built-in function that gives us a generator-like object. But how can we create our own custom generators? The simplest way is with a syntax that looks almost identical to a list comprehension. In Post #127, we will learn about generator expressions.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

What is a Generator?

You’ve Already Been Using a Generator-like Object!

How to Use a Generator

What’s Next?

Author

Leave a Comment Cancel reply