Horizontal Table Partitioning: MD5 Hash vs. Bit‑Shift Sharding
This article explains two practical horizontal sharding techniques for large databases—using an MD5 hash of the user UID to distribute rows across 256 tables, and right‑shifting the UID to create scalable table ranges—while discussing their capacity limits and extensibility.
In large‑scale projects, horizontal partitioning (sharding) is employed to reduce the load on a single database or table. The article presents two sharding methods commonly used in the author's SNS system, where the auto‑incrementing user UID serves as the sharding key.
Method 1: MD5 Hash
The UID is hashed with MD5 and the first two hexadecimal characters are taken as a suffix, mapping the user to tables named user_00 … user_ff (256 tables). This distributes users almost uniformly across the tables.
Limitation: if the number of users grows beyond the capacity of each table, the fixed two‑digit suffix cannot be expanded, leading back to the original scaling problem.
Method 2: Bit Shifting
The UID is right‑shifted by 20 bits, grouping roughly the first 1 million users into user_0000, the next million into user_0001, and so on. By reserving a four‑digit suffix, up to 10 000 tables can be created, each holding about 1 million rows (≈10 billion rows total).
If the dataset exceeds this, the suffix length can be increased (e.g., six digits) to support 100 million tables, allowing storage of up to 100 billion rows.
The algorithm can be made more flexible, as illustrated in the following diagram:
Summary and Recommendations
Both methods require estimating the maximum expected user volume and the per‑table capacity. For the bit‑shifting approach, shifting 20 bits ensures each table holds about 1 million rows; the four‑digit suffix allows easy addition of new tables as the user base grows. For the MD5 approach, increasing the number of hex digits (e.g., using three or four characters) expands the table count proportionally.
Overall, the bit‑shifting method offers better scalability, while the MD5 hash provides a simple uniform distribution when the total number of tables is modest.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
