A visual guide to consistent hashing

  • Maximize the cache hits: This will reduce the load on the primary data source and reduce the overall latency.
  • Distribute the data and traffic evenly: This ensures optimal use of servers and avoids overloading a subset of the servers.

Modulo Hashing

Modulo Hashing: Static nodes
  • Cache Hit Ratio: As expected, 2/3rd of the requests(67%) are returned from the cache. This is good.
  • Data and load distribution: The load is not equally distributed across all servers, but it is fairly distributed. It is purely based on the distribution property of the hash function.
  • 3 servers(S0, S1, S2), 100 unique keys, 300 random requests as earlier
  • 1 server(S3) is added after the 100th request
  • 1 server(S1) is removed after the 200th request
Modulo Hashing: Elastic nodes
  • Cache Hit Ratio: This drops from ~67% earlier to ~45%.
  • Data and load distribution: The total number of keys across the servers has increased. This indicates some keys are stored in multiple servers.

Consistent Hashing

  • The servers(called nodes) are hashed and mapped to a number in a fixed range. This range can be imagined as a real number(not an integer) between 0 and 360 to represent it as points on a circle. The node is placed on this circular ring based on its hash value mapped to range 0–360. For example: If the hash value of the server S1 maps to 90, it will be placed as the point at 90 degrees on the circumference of the circle.
  • The keys are hashed similarly and mapped to a point on the circle. The request for this key is routed to the closest node in the clockwise direction (It can be anticlockwise as well, as long as the same direction is used for all the keys).
Consistent Hashing: Basic Concept
Consistent Hashing(Basic): Static nodes
  • 3 servers(S1, S2, S3), 100 unique keys, 300 random requests as earlier
  • 1 server(S4) is added after the 100th request
  • 1 server(S1) is removed after the 200th request
Consistent Hashing(Basic): Elastic nodes
  • Cache Hit Ratio: The cache hit ratio is ~60% compared to ~50% in modulo hashing. This is because very few keys are remapped when a node is added or removed.
  • Data and load distribution: The load distribution is skewed. In the above scenario(with the chosen node names), the new node S4 doesn't get many requests due to its proximity to the node S3.
Consistent Hashing(Full): Elastic nodes
  • Cache Hit Ratio: The cache hit ratio remains good i.e. ~60% compared to ~50% in modulo hashing.
  • Data and load distribution: The load is distributed a lot better now. The new node S4 gets a fair amount of traffic compared to earlier. This is because the node S4 is mapped multiple fragments of the ring, increasing its chance of a fair share of traffic.

Conclusion

Acknowledgments

  • The idea of the visual simulation was inspired by the amazing interactive posts like this by @bciechanowski

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store