Blog Post #77: The Main Event: Set Operations (Union and Intersection)

In Post #75, we learned that sets are collections of unique, unordered items. While their ability to automatically handle duplicates is useful on its own, their true power comes from the ability to perform high-speed mathematical set operations.

In this post, we’ll explore the two most fundamental set operations: union, for combining sets, and intersection, for finding commonalities between them. These are incredibly powerful tools for data analysis.

Setting the Stage

To demonstrate these operations, let’s imagine we have two sets of musical artists from two different playlists: a local favorites playlist and a global hits playlist.

local_artists = {"Taylor Swift", "Ed Sheeran", "Dua Lipa", "Harry Styles"}
global_artists = {"Ed Sheeran", "The Weeknd", "Harry Styles", "Adele"}

Union: Combining All Unique Items

A union of two sets creates a new set that contains all the unique items from both original sets. It’s the result of merging the sets together and automatically handling the duplicates.

Using the | Operator

The most common and “Pythonic” way to perform a union is with the vertical bar | operator.

# Create a new set with all artists from both playlists
all_artists = local_artists | global_artists

print(all_artists)

The output will be a single set containing every unique artist from both original sets (the order may vary):

{'Adele', 'Taylor Swift', 'The Weeknd', 'Dua Lipa', 'Harry Styles', 'Ed Sheeran'}

Using the .union() Method

Alternatively, you can use the .union() method. It is called on one set and takes another set as an argument, achieving the same result.

all_artists_method = local_artists.union(global_artists)
print(all_artists_method)

Both are valid, but the | operator is often preferred for its brevity and mathematical look.

Intersection: Finding Common Items

An intersection of two sets creates a new set that contains only the items that are present in both original sets. It’s how you find what two collections have in common.

Using the & Operator

The standard way to perform an intersection is with the ampersand & operator. Let’s find out which artists appear on both of our playlists.

# Create a new set with only the artists that are in BOTH sets
common_artists = local_artists & global_artists

print(common_artists)

The output shows only the two artists that were present in local_artists AND global_artists:

{'Harry Styles', 'Ed Sheeran'}

Using the .intersection() Method

Just like with union, there is also a corresponding method: .intersection().

common_artists_method = local_artists.intersection(global_artists)
print(common_artists_method)

Again, the & operator is generally more common in practice.

What’s Next?

You now have the power to combine and compare sets using union (|) and intersection (&). These operations are the foundation of using sets for data analysis, allowing you to quickly merge collections and find common elements between them.

Union and intersection are just the beginning. There are two other fundamental set operations that are equally useful: finding items that are in one set but not the other. In Post #78, we will explore difference and symmetric difference.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment