Comparison of Consistent Hash (Ring Hash) and JumpStringHash for DBLE Sharding
This article explains the principles, characteristics, and performance trade‑offs of the consistent‑hash (ring‑hash) and jumpstringhash sharding algorithms used by DBLE, presents test results on variance, latency and data balance, and concludes why DBLE prefers jumpstringhash.
Background
MyCat provides three sharding modes for string‑type fields: modulo hash, jumpstringhash, and consistent hash (ring hash). DBLE inherits MyCat’s modulo hash but recommends the higher‑performance, more balanced jumpstringhash algorithm.
Introduction
The following sections briefly introduce the principles, features, and pros/cons of consistent hash (ring hash) and jumpstringhash.
Consistent Hash (Ring Hash)
The algorithm works by generating a fixed set of identifiers (e.g., SHARD‑1‑NODE‑1) for X × N shards, computing their hash values, sorting them in a map, and then assigning an input string to the shard whose hash is just smaller than the input’s hash.
JumpStringHash
The algorithm distributes data uniformly by:
Using a formula (shown in the image) to generate pseudo‑random values.
For each input string, computing its hash and repeatedly generating pseudo‑random numbers; if a number is less than 1/x for the current node, the node is selected, and the node with the highest index becomes the final placement.
In practice, a reverse‑checking method from the highest node is used: when a generated random value exceeds x, that node is chosen.
Data Comparison
Tests were conducted on 350,595 real data rows to compare variance, time consumption, and maximum deviation of data distribution across different shard counts.
Test Results – Variance
1. Variance decreases as shard count increases for both methods. 2. When the number of shards is large enough, the two methods show little difference. 3. With relatively few nodes, jumpstringhash achieves the best balance. 4. Increasing the number of rings in consistent hash improves balance, but the improvement diminishes as shard count grows.
Test Results – Time Consumption
1. Consistent hash is an order of magnitude slower than jumpstringhash. 2. Jumpstringhash’s time cost grows slightly with shard count. 3. Consistent hash’s time cost also grows slightly as the number of rings increases.
Test Results – Maximum Deviation
1. Maximum deviation for both methods decreases as shard count increases, and differences vanish at high shard counts. 2. With few shards, jumpstringhash provides the best balance, while consistent hash performs worst. 3. Adding more rings to consistent hash improves balance, but the benefit diminishes with more shards.
Conclusion
Overall, consistent hash shows poor performance across all tests, leading DBLE to adopt jumpstringhash as its default sharding algorithm. Users migrating from MyCat who still use consistent hash can implement a custom sharding algorithm.
For custom sharding algorithm details, see: GitHub – DBLE route function spec
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.