Blog Post #82: Part 3 Recap & Project: Analyzing Commonalities in Two Lists

Welcome to the final post of Part 3! Over this section, we have taken a deep dive into Python’s four essential built-in data structures, the containers that hold and organize all of our program’s data.

To wrap up, we will first briefly recap the unique role of each data structure. Then, we’ll build a final project that showcases the power of sets to efficiently analyze and compare two lists of data.

Part 3 Recap: Your Data Structure Toolkit

You now have a complete toolkit for storing collections of data, with each tool designed for a specific job:

  • Lists ([]): Your go-to for ordered, changeable collections. Perfect for to-do lists, a collection of user posts, or any sequence of items that needs to grow, shrink, or be modified (as seen in Posts #46, #49, #50, #51).
  • Tuples (()): The immutable, ordered sequence. Use these for fixed data records, like coordinates or for returning multiple values from a function, where you want to guarantee the data cannot be changed (Posts #58, #61).
  • Dictionaries ({}): Your tool for labeled data. Dictionaries store data in key-value pairs, making them perfect for representing objects like a user profile or configuration settings where you need to look up values by a meaningful name (Posts #64, #70).
  • Sets ({} or set()): The specialist for unique items. Sets are unordered collections that automatically handle duplicates and provide powerful mathematical operations for comparing collections (Posts #75, #77).

Project: Analyzing List Commonalities

Let’s imagine two friends are comparing their favorite movies to decide what to watch. Our program will take their two lists of movies and generate a report showing which movies they both like, which movies are unique to each friend, and a complete list of all unique movies available to watch.

Step 1: The Initial Data

First, in a new file, let’s define our two lists of movie titles. Notice that there are some duplicates within each list (e.g., “Inception”) and some overlap between the lists (e.g., “Dune”).

friend_a_movies = ["The Matrix", "Inception", "The Dark Knight", "Inception", "Dune"]
friend_b_movies = ["Dune", "Parasite", "The Dark Knight", "Joker", "Parasite"]

Step 2: Convert Lists to Sets

The most efficient way to compare these lists is to convert them into sets. As we learned in Post #80, this will automatically remove the duplicates from each list and give us access to the powerful set operations.

set_a = set(friend_a_movies)
set_b = set(friend_b_movies)

After this step, set_a will be {'The Matrix', 'Inception', 'The Dark Knight', 'Dune'} and set_b will be {'Dune', 'Parasite', 'The Dark Knight', 'Joker'}.

Step 3: Perform the Set Operations

Now we can use the set operators from Post #77 and Post #78 to answer our questions with single lines of code.

# 1. Find the movies they both like (Intersection)
common_movies = set_a & set_b

# 2. Find movies only Friend A likes (Difference)
a_only_movies = set_a - set_b

# 3. Find movies only Friend B likes (Difference)
b_only_movies = set_b - set_a

# 4. Find all unique movies from both lists (Union)
all_unique_movies = set_a | set_b

Step 4: Display the Report

Finally, let’s use f-strings to print our results in a clean, readable format.

print("--- Movie Analysis Report ---")
print(f"Movies both friends like: {common_movies}")
print(f"Movies only Friend A likes: {a_only_movies}")
print(f"Movies only Friend B likes: {b_only_movies}")
print(f"All unique movie options: {all_unique_movies}")

The Complete Program

Here is the full script. It’s a concise and powerful example of using the right data structure for the job.

# 1. The Initial Data
friend_a_movies = ["The Matrix", "Inception", "The Dark Knight", "Inception", "Dune"]
friend_b_movies = ["Dune", "Parasite", "The Dark Knight", "Joker", "Parasite"]

# 2. Convert Lists to Sets to automatically handle duplicates
set_a = set(friend_a_movies)
set_b = set(friend_b_movies)

# 3. Perform the Set Operations
common_movies = set_a & set_b
a_only_movies = set_a - set_b
b_only_movies = set_b - set_a
all_unique_movies = set_a | set_b

# 4. Display the Report
print("--- Movie Analysis Report ---")
print(f"Movies both friends like: {common_movies}")
print(f"Movies only Friend A likes: {a_only_movies}")
print(f"Movies only Friend B likes: {b_only_movies}")
print(f"All unique movie options: {all_unique_movies}")

Congratulations and What’s Next for Part 4

Excellent work! You’ve not only learned about Python’s core data structures but also built a practical program that uses their unique strengths to solve a common data analysis problem. You now have a solid foundation for organizing almost any kind of information in your programs.

So far, we’ve been writing our code in single, linear scripts. As our programs grow, this becomes messy and hard to manage. The key to professional programming is organizing code into reusable, logical blocks. In Part 4: Functions and Modularity, we will learn how to define our own functions to stop repeating ourselves and make our code more powerful and maintainable.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com

Leave a Comment