Blog Post #73: Performance: Why Are Dictionaries So Fast? (A Look at Hashing)

Table of Contents

Dictionaries are one of the most used data structures in Python, prized for their ability to store and retrieve data with a key. One of their most amazing features is that they are incredibly fast. Looking up a key in a dictionary with ten million items is, on average, just as fast as looking it up in a dictionary with ten items. How is this possible?

In this post, we’ll take a high-level, intuitive look at the clever computer science concept that powers dictionaries: hashing and hash tables.

The Slow Search: How Lists Find Things

To understand why dictionaries are fast, we first need to understand why a simple search in a list can be slow.

Imagine you’re looking for a specific book in a massive, disorganized library shelf. To find the book you want, you have to start at one end and check every single book, one by one, until you find it. If your book is at the very end, you have to scan the entire shelf. This is called a linear search.

This is how searching works in a list. If you write if "apple" in my_list:, Python has to check each item from the beginning until it finds “apple”. The time it takes grows in direct proportion to the size of the list.

The Dictionary’s Secret: Hash Tables

Dictionaries don’t do a linear search. They use a much smarter data structure called a hash table.

Let’s use an analogy. Think of a hash table as a magical filing cabinet with thousands of numbered drawers.

Storing Data:

When you want to store a new key-value pair, like my_dict[“name”] = “Alice”, you don’t just put it in the next empty drawer.

You first take the key ("name") to a special attendant called a hash function.
The hash function is like a super-fast, consistent recipe. It takes your key and instantly calculates a number. For example, the hash of "name" might always be 157.
This number tells you exactly which drawer to use. You go directly to drawer #157 and store both the key ("name") and the value ("Alice") inside.

Retrieving Data:

Later, when you want to get the value for “name” by writing user_name = my_dict[“name”]:

You don’t search all the drawers.
You take the key ("name") back to the same hash function. Because the function is consistent, it instantly gives you the same number back: 157.
You go directly to drawer #157, open it, and retrieve the value “Alice”.

The Result: Near-Instant Lookups

Because of hashing, the size of the dictionary has almost no impact on the time it takes to find an item. The process is always the same: hash the key and go directly to the calculated location. This is why we say dictionary lookups are, on average, constant time operations.

Whether the filing cabinet has 10 folders or 10 million, as long as the hash function is good, finding the right drawer takes the same, tiny amount of time.

A Final Mystery Solved: Why Keys Must Be Immutable

This explains a rule we learned in Post #63: dictionary keys must be immutable (like strings, numbers, or tuples). Why?

The hash function’s recipe depends on the content of the key. If the key could change after being stored (if it were mutable, like a list), its hash value would also change.

In our analogy, this would be like changing the label on a folder after it’s been filed. The hash function’s recipe for the new label would point to a different drawer, and you’d never be able to find your original data again. The system would break down. Immutability guarantees that a key’s hash value will remain constant forever, ensuring that your data is always findable.

What’s Next?

You don’t need to know the deep details of how to implement a hash table, but understanding the core concept of hashing helps you appreciate why dictionaries are so fast and efficient, and why their keys must be immutable. It’s a clever solution to the problem of searching for data.

We’ve now covered all the fundamental theory of dictionaries. It’s time to put it all together in a practical project. In Post #74, we will use our knowledge of dictionary operations and nesting to build a simple contact book application.

Author

Debjeet Bhowmik

Experienced Cloud & DevOps Engineer with hands-on experience in AWS, GCP, Terraform, Ansible, ELK, Docker, Git, GitLab, Python, PowerShell, Shell, and theoretical knowledge on Azure, Kubernetes & Jenkins. In my free time, I write blogs on ckdbtech.com