Blog Post #75: No Duplicates Allowed: An Introduction to Sets

So far in Part 3, we have explored lists, tuples, and dictionaries. Each has a specific purpose: lists for ordered, mutable collections; tuples for ordered, immutable collections; and dictionaries for key-value mappings. Now, we’ll introduce the fourth and final major built-in data structure in Python: the set.

Sets are a specialized data structure designed for one primary purpose: to hold a collection of unique items. In this post, we’ll learn the two core properties that define a set and make it so useful.

What is a Set?

A set is an unordered collection of unique elements.

This definition is simple but packed with meaning. Let’s break down those two key properties—”unique” and “unordered”—as they are what make sets different from the lists and tuples we already know.

Property 1: Sets Contain Only Unique Elements

This is the defining feature of a set. A set cannot contain duplicate items. If you add an item to a set that is already present, the set simply remains unchanged.

This makes sets incredibly useful for tasks like finding the unique items from a list that contains duplicates. The easiest way to create a set is to use the set() constructor on an existing iterable, like a list.

numbers_with_duplicates = [1, 2, 2, 3, 4, 4, 4, 5]

# Create a set from the list
unique_numbers = set(numbers_with_duplicates)

print(f"Original list: {numbers_with_duplicates}")
print(f"Set created from list: {unique_numbers}")

The output will be:

Original list: [1, 2, 2, 3, 4, 4, 4, 5]
Set created from list: {1, 2, 3, 4, 5}

As you can see, the set automatically and efficiently discarded all the duplicate values, leaving only one instance of each number. This is the primary superpower of sets.

Property 2: Sets are Unordered

Unlike lists and tuples, sets are unordered. This means the items in a set do not have a defined position or index. There is no “first item” or “last item” in a set.

The main implication of this is that you cannot use an index or slice to access items in a set.

my_set = {1, 2, 3}

# This will cause a TypeError!
# first_item = my_set[0]

Running this code will crash your program with TypeError: 'set' object is not subscriptable. This error happens because the concept of an item “at index 0” has no meaning for an unordered collection.

You can still loop over a set with a for loop (e.g., for item in my_set:), but the order in which the items are returned is not guaranteed. You should never write code that relies on a set’s order.

What’s Next?

You’ve now been introduced to the set: an unordered collection of unique items. Its ability to automatically handle uniqueness makes it a powerful tool for data cleaning and for performing mathematical set operations (like union and intersection), which we will see in upcoming posts.

In our example, we created a set using the set() constructor. You can also create them using curly braces {}, similar to dictionaries. However, there’s a small but important ‘gotcha’ when it comes to creating an empty set. In Post #76, we will look at the specific syntax for creating sets and this common pitfall.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment