A visual guide to consistent hashing

Caching is an important aspect of high-performance applications. As the data volume increases, the cached data needs to be distributed across multiple servers. We need to make sure the following objectives are met while doing so.

  • Maximize the cache hits: This will reduce the load on the primary data source and reduce the overall latency.
  • Distribute the data and traffic evenly: This ensures optimal use of servers and avoids overloading a subset of the servers.

As the title of this post suggests, we will look into how consistent hashing can be used to achieve the above objectives. Before that, let’s look at using a straightforward approach to solving the problem.

Note: This article was initially published here. It has been adapted for medium format using CodePen for embedding simulations.

Modulo Hashing

In this approach, we hash the requests based on a key and use the formula hash(key) % number_of_servers to route the request to the appropriate cache server. For example: If a key "apple" hashes to 14 and we have 3 cache servers, the request for "apple" will be forwarded to server number 2 as 14 % 3 = 2.

Let’s simulate this for 3 cache servers, 100 unique keys, 300 random requests and see how it performs.

Click the CodePen below and check the stats. Increase the speed to fast forward.

Note: If you are on mobile, please click here for a better view of the simulation.

Modulo Hashing: Static nodes

Let’s analyze the results

  • Cache Hit Ratio: As expected, 2/3rd of the requests(67%) are returned from the cache. This is good.
  • Data and load distribution: The load is not equally distributed across all servers, but it is fairly distributed. It is purely based on the distribution property of the hash function.

Modulo hashing works well for a fixed number of servers. But in many cases, we need to add or remove servers as per the variation in traffic volume. And, servers can crash sometimes. Let’s simulate the following dynamic nodes scenario with modulo hashing and see how it performs.

  • 3 servers(S0, S1, S2), 100 unique keys, 300 random requests as earlier
  • 1 server(S3) is added after the 100th request
  • 1 server(S1) is removed after the 200th request

Note: If you are on mobile, please click here for a better view of the simulation.

Modulo Hashing: Elastic nodes

Let’s analyze the results

  • Cache Hit Ratio: This drops from ~67% earlier to ~45%.
  • Data and load distribution: The total number of keys across the servers has increased. This indicates some keys are stored in multiple servers.

This is because many keys are mapped to a different server when the number of servers changes. For example, a key “orange” with hash value 11 is initially routed to the server S2 when there are 3 servers(11 % 3 = 2), whereas it is routed to the server S3 when there are 4 servers(11 % 4 = 3). This leads to ineffective use of cache.

Consistent Hashing

Consistent Hashing has a different approach to address the drawbacks of the modulo hashing with dynamic nodes. Let’s start with the basic concepts of consistent hashing.

  • The servers(called nodes) are hashed and mapped to a number in a fixed range. This range can be imagined as a real number(not an integer) between 0 and 360 to represent it as points on a circle. The node is placed on this circular ring based on its hash value mapped to range 0–360. For example: If the hash value of the server S1 maps to 90, it will be placed as the point at 90 degrees on the circumference of the circle.
  • The keys are hashed similarly and mapped to a point on the circle. The request for this key is routed to the closest node in the clockwise direction (It can be anticlockwise as well, as long as the same direction is used for all the keys).

Let’s play this simulation at 0.5x speed and visualize this basic concept.

Note: If you are on mobile, please click here for a better view of the simulation.

Consistent Hashing: Basic Concept

Now that we understand the basic concept, let’s run the simulation and observe the stats for 3 servers, 100 unique keys, 300 random requests. (Please increase the simulation speed when required to fast forward to the final stats)

Note: If you are on mobile, please click here for a better view of the simulation.

Consistent Hashing(Basic): Static nodes

We can observe that cache hit ratio and load distribution is very similar to that of modulo hashing. This is expected as the algorithm behaves almost the same for a fixed number of servers.

Let’s see how this basic concept of consistent hashing handles the addition and removal of the nodes using the below scenario

  • 3 servers(S1, S2, S3), 100 unique keys, 300 random requests as earlier
  • 1 server(S4) is added after the 100th request
  • 1 server(S1) is removed after the 200th request

Note: If you are on mobile, please click here for a better view of the simulation.

Consistent Hashing(Basic): Elastic nodes

Let’s analyze the results

  • Cache Hit Ratio: The cache hit ratio is ~60% compared to ~50% in modulo hashing. This is because very few keys are remapped when a node is added or removed.
  • Data and load distribution: The load distribution is skewed. In the above scenario(with the chosen node names), the new node S4 doesn't get many requests due to its proximity to the node S3.

Consistent hashing solves this load distribution problem by placing each node at multiple points on the ring. These points are called virtual nodes. For example, if we need to represent node S1 as 4 points on the ring, we place virtual nodes S1-1 to S1-4 on the ring using the same logic as earlier. This allows multiple small fragments of the ring to be mapped to a single node.

Let’s simulate the previous elastic nodes scenario with 12 virtual nodes per node.

Note: If you are on mobile, please click here for a better view of the simulation.

Consistent Hashing(Full): Elastic nodes

Let’s analyze the results

  • Cache Hit Ratio: The cache hit ratio remains good i.e. ~60% compared to ~50% in modulo hashing.
  • Data and load distribution: The load is distributed a lot better now. The new node S4 gets a fair amount of traffic compared to earlier. This is because the node S4 is mapped multiple fragments of the ring, increasing its chance of a fair share of traffic.

Conclusion

Consistent hashing has proven to be a useful technique since its inception in 1997 and it is used in many well-known distributed systems because of the simplicity and the benefits it offers. The optimization of consistent hashing does not end with what we have read so far. For example: check out this blog or the video by Vimeo engineering on their practical usage and adaptation.

Acknowledgments

  • The idea of the visual simulation was inspired by the amazing interactive posts like this by @bciechanowski

Human being running on curiosity