Databases 5 min read

Horizontal Table Partitioning: MD5 Hash vs. Bit‑Shift Sharding

This article explains two practical horizontal sharding techniques for large databases—using an MD5 hash of the user UID to distribute rows across 256 tables, and right‑shifting the UID to create scalable table ranges—while discussing their capacity limits and extensibility.

ITPUB
ITPUB
ITPUB
Horizontal Table Partitioning: MD5 Hash vs. Bit‑Shift Sharding

In large‑scale projects, horizontal partitioning (sharding) is employed to reduce the load on a single database or table. The article presents two sharding methods commonly used in the author's SNS system, where the auto‑incrementing user UID serves as the sharding key.

Method 1: MD5 Hash

The UID is hashed with MD5 and the first two hexadecimal characters are taken as a suffix, mapping the user to tables named user_00user_ff (256 tables). This distributes users almost uniformly across the tables.

Limitation: if the number of users grows beyond the capacity of each table, the fixed two‑digit suffix cannot be expanded, leading back to the original scaling problem.

Method 2: Bit Shifting

The UID is right‑shifted by 20 bits, grouping roughly the first 1 million users into user_0000, the next million into user_0001, and so on. By reserving a four‑digit suffix, up to 10 000 tables can be created, each holding about 1 million rows (≈10 billion rows total).

If the dataset exceeds this, the suffix length can be increased (e.g., six digits) to support 100 million tables, allowing storage of up to 100 billion rows.

The algorithm can be made more flexible, as illustrated in the following diagram:

Summary and Recommendations

Both methods require estimating the maximum expected user volume and the per‑table capacity. For the bit‑shifting approach, shifting 20 bits ensures each table holds about 1 million rows; the four‑digit suffix allows easy addition of new tables as the user base grows. For the MD5 approach, increasing the number of hex digits (e.g., using three or four characters) expands the table count proportionally.

Overall, the bit‑shifting method offers better scalability, while the MD5 hash provides a simple uniform distribution when the total number of tables is modest.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Scalabilitydatabase shardinghorizontal partitioningbit shiftingMD5 hash
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.