Hash Tables Internals — Core Concepts
Why hash table internals matter
Python dictionaries and sets are built on hash tables. They are among the most-used data structures in the language — function keyword arguments, class attributes, module namespaces, and JSON parsing all rely on dict. Understanding how hash tables work helps you write faster code, debug unexpected behavior, and know when to reach for alternatives.
How hashing works
A hash function converts a key into an integer (the hash value). Python uses the built-in hash() function:
hash("hello") # e.g., 2534051032171071756
hash(42) # 42 (integers hash to themselves for small values)
hash((1, 2, 3)) # tuples are hashable
# hash([1, 2, 3]) # TypeError: unhashable type: 'list'
Rules for hash functions:
- Deterministic: the same key always produces the same hash.
- Fast: hashing should be O(1) for typical keys.
- Uniform: hash values should spread evenly across the table.
- Consistent with equality: if
a == b, thenhash(a) == hash(b).
Only immutable types are hashable in Python: strings, numbers, tuples (of hashable elements), frozensets. This is why lists and dicts cannot be dictionary keys.
From hash to index
The hash value is converted to a table index using modular arithmetic:
index = hash(key) % table_size
If the table has 8 slots and hash("hello") = 2534051032171071756, the index is 2534051032171071756 % 8 = 4. The key-value pair goes in slot 4.
Collision resolution
When two keys hash to the same index, you have a collision. Python uses open addressing with probing — it looks for the next available slot using a probe sequence.
CPython’s probe formula:
next_index = (5 * current_index + 1 + perturb) % table_size
perturb >>= 5
The perturb value starts as the full hash and is shifted right each step. This creates a pseudo-random probe sequence that avoids clustering (where collisions bunch together).
Load factor and resizing
The load factor is the ratio of filled slots to total slots. When the table becomes too full, lookups slow down because more probing is needed.
Python resizes the table when it reaches about 2/3 full:
- Tables start with 8 slots.
- When resized, the table grows to the next power of 2 that keeps the load factor below 2/3.
- All existing entries are re-inserted (rehashed) into the new table.
Resizing is O(n), but it happens infrequently enough that the amortized cost per insertion remains O(1).
Dictionary ordering
Since Python 3.7, dictionaries maintain insertion order. This is implemented using a compact layout:
- A hash table (sparse array) stores indices into a dense array.
- A dense array stores entries in insertion order:
[(hash, key, value), ...].
This design uses less memory than the old layout (which stored entries directly in the sparse table with many empty slots) and preserves order as a free bonus.
Sets vs. dictionaries
A set is essentially a dict without values — it uses the same hash table mechanism. This is why sets have O(1) average-case membership testing:
my_set = {1, 2, 3, 4, 5}
3 in my_set # O(1) — uses hash table lookup
Performance characteristics
| Operation | Average case | Worst case |
|---|---|---|
Lookup (d[key]) | O(1) | O(n) |
Insert (d[key] = val) | O(1) | O(n) |
Delete (del d[key]) | O(1) | O(n) |
| Iteration | O(n) | O(n) |
The O(n) worst case happens when many keys collide — extremely rare with Python’s hash function for typical data.
Common misconception
“Dictionaries are slow because they use more memory.” Dictionaries trade memory for speed. A list lookup by value is O(n); a dict lookup by key is O(1). For anything beyond a handful of items, the speed advantage of dictionaries far outweighs the memory cost. Python’s compact dict implementation also reduced memory usage by 20-25% compared to Python 3.5.
One thing to remember: When you need fast lookups by key, reach for a dictionary. When you need fast membership checks, reach for a set. Both are hash tables underneath, and they are among the most optimized data structures in CPython.
See Also
- Python Backtracking Algorithms How computers solve puzzles by trying every path, backing up when stuck, and never giving up until they find the answer.
- Python Big O Complexity Analysis Why some programs finish in a blink while others take forever — explained with pizza delivery and toy cleanup.
- Python Binary Search Implementation Find anything in a sorted list insanely fast using the same trick you already use with dictionaries and phone books.
- Python Dynamic Programming The clever trick of remembering answers you already figured out so you never solve the same puzzle twice.
- Python Graph Algorithms How computers navigate maps, friendships, and connections using dots and lines — explained without any math.