Python Consistent Hashing — Core Concepts

Learn how consistent hashing distributes keys across Python servers with minimal disruption when nodes join or leave.

The problem with simple hashing

A naive approach to distributing data across N servers: server = hash(key) % N. This works perfectly until N changes. Add one server and nearly every key maps to a different server. For a cache cluster, that means almost every cached entry becomes unreachable — a massive “cache storm” that floods the database.

How consistent hashing works

Instead of modular arithmetic, consistent hashing places both servers and keys on a virtual ring (imagine a clock face):

Hash each server to a position on the ring (0 to 2³²-1).
Hash each key to a position on the same ring.
Walk clockwise from the key’s position until you find the first server. That server owns the key.

When a server is removed, only the keys that were assigned to it need to move — they shift to the next server clockwise. All other keys stay where they are.

When a server is added, it takes over a portion of keys from the next server clockwise. Again, only a fraction of keys move.

Virtual nodes

A problem with basic consistent hashing: servers aren’t evenly distributed on the ring. One server might own 60% of the ring while another owns 10%.

Virtual nodes solve this by placing each physical server at multiple positions on the ring. Instead of one point per server, you might place 150 virtual nodes per server. This smooths out the distribution so each server owns roughly the same fraction of the key space.

Real-world impact

Event	Simple hash (% N)	Consistent hashing
Remove 1 of 4 servers	~75% keys remap	~25% keys remap
Add 1 server to 4	~80% keys remap	~20% keys remap
Remove 1 of 100 servers	~99% keys remap	~1% keys remap

The larger the cluster, the bigger the benefit.

Where Python applications use it

Distributed caching — Memcached clients use consistent hashing to route keys to cache servers. When one server fails, only its keys are redistributed.
Load balancing — route requests to backend servers based on client IP or session ID, ensuring the same client hits the same server (session affinity).
Distributed databases — systems like Cassandra and DynamoDB use consistent hashing to assign data partitions to nodes.
Sharded task queues — route tasks to specific workers based on a partition key to maintain ordering per entity.

Replication with consistent hashing

To replicate data across multiple servers for fault tolerance, assign each key to the next K servers clockwise on the ring (where K is the replication factor). If one server goes down, the other K-1 replicas still serve the data.

Common misconception

Developers sometimes think consistent hashing eliminates all data movement when the cluster changes. It doesn’t — it minimizes it. When a server leaves, all its keys must move somewhere. The innovation is that keys belonging to other servers stay untouched.

The one thing to remember: consistent hashing maps both servers and keys to a ring, so cluster changes only affect the keys owned by the changing server — everything else stays in place.

pythondistributed-systemsalgorithms