Blog Post #80: The Most Common Use Case: Finding Unique Items in a List

In the last few posts, we’ve explored the mathematical side of sets, learning about union, intersection, and difference (Post #77, Post #78). While these operations are powerful, the reality is that in your day-to-day programming, you will use sets for one task far more than any other.

In this post, we’ll cover the most frequent and practical application of sets: quickly and efficiently removing duplicate items from a list.

The Problem: A List with Duplicates

Imagine you have a list of data that contains duplicate entries. This is a very common scenario. Perhaps you’ve collected a list of user tags, and some tags have been entered more than once. Your goal is to get a clean list containing only the unique tags.

repeated_tags = ["python", "coding", "tips", "python", "guide", "coding"]

How can we get a list that just contains ["python", "coding", "tips", "guide"]?

The Set Conversion Trick

Because sets, by their very nature, cannot contain duplicates (as we learned in Post #75), they provide the perfect tool for this job. The process is a simple and highly efficient two-step trick:

  1. Convert the list to a set. This instantly and automatically discards all duplicate items.
  2. Convert the set back to a list. This gives you a list again, which is often what you need for further processing.

Let’s see it in action with our list of tags.

repeated_tags = ["python", "coding", "tips", "python", "guide", "coding"]
print(f"Original list with duplicates: {repeated_tags}")

# Step 1: Convert the list to a set to remove duplicates
unique_tags_set = set(repeated_tags)
print(f"The intermediate set (unique items): {unique_tags_set}")

# Step 2: Convert the set back to a list
unique_tags_list = list(unique_tags_set)
print(f"Final list with unique items: {unique_tags_list}")

The output will look something like this (remember, the order of items in a set is not guaranteed):

Original list with duplicates: ['python', 'coding', 'tips', 'python', 'guide', 'coding']
The intermediate set (unique items): {'guide', 'coding', 'tips', 'python'}
Final list with unique items: ['guide', 'coding', 'tips', 'python']

An Important Caveat: Order is Not Preserved

There is one very important thing to remember when using this technique. Sets are unordered. This means that when you convert your list to a set and back again, the order of the items in the final list is not guaranteed to be the same as their original order.

This method is perfect when you only care about getting the unique items and the final order doesn’t matter. If you need to preserve the original order of the items, a more advanced technique is required.

The One-Liner

Experienced Python programmers will typically combine these two steps into a single, readable line of code. It works from the inside out: set() is called on the list first, and then list() is called on the resulting set.

repeated_tags = ["python", "coding", "tips", "python", "guide", "coding"]

# The same result in a single line
unique_tags = list(set(repeated_tags))

print(f"Final list created in one line: {unique_tags}")

What’s Next?

This list(set(my_list)) pattern is an essential and “Pythonic” idiom. It’s the go-to solution for deduplicating a list when the order of the final result is not important. You will see and use this trick all the time.

We’ve seen that sets, like lists, are mutable—you can add and remove items from them. But what if you need a set that is immutable, just like a tuple? What if you wanted to use a set as a key in a dictionary? For this, Python provides a special type of set. In Post #81, we will meet the frozenset.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment