Databases 5 min read

How to Horizontally Partition Tables Using MD5 Hashing and Bit Shifting

This article explains two practical horizontal sharding techniques for large‑scale projects—MD5‑based hashing and bit‑shifting of user IDs—detailing their PHP implementations, scalability limits, and how to extend table counts as data grows.

21CTO
21CTO
21CTO
How to Horizontally Partition Tables Using MD5 Hashing and Bit Shifting

In medium and large projects, horizontal partitioning of databases or tables is often used to reduce the load on a single database or table. The article presents two common sharding methods that route data to specific tables based on the user’s UID, which is a unique auto‑incrementing identifier.

Method 1: MD5 Hashing

The UID is hashed with MD5 and the first two characters of the hash are used as a suffix, directing the record to a table named user_xx. This distributes users across 256 tables (from user_00 to user_ff), achieving near‑uniform distribution.

function getTable($uid) {
    $ext = substr(md5($uid), 0, 2);
    return "user_" . $ext;
}

While this method spreads data evenly, it may become insufficient if the number of users grows beyond the capacity of 256 tables, requiring a longer hash prefix.

Method 2: Bit Shifting

This approach right‑shifts the UID by a configurable number of bits (commonly 20) and formats the result as a zero‑padded suffix. Each suffix corresponds to a separate table, e.g., user_0000, user_0001, etc.

public function getTable($uid) {
    return "user_" . sprintf("%04d", ($uid >> 20));
}

With a four‑digit suffix, up to 10,000 tables can be created, each holding about one million rows, allowing storage of up to 100 billion users. The suffix length can be increased (e.g., six digits) to support even larger scales.

A more flexible version lets you specify the suffix length and shift amount:

/**
 * Get table name based on UID sharding algorithm
 * @param int $uid   // User ID
 * @param int $bit   // Number of digits to keep in suffix
 * @param int $seed  // Number of bits to shift right
 */
function getTable($uid, $bit, $seed) {
    return "user_" . sprintf("%0{$bit}d", ($uid >> $seed));
}

Both methods require an upfront estimate of the maximum data volume per table. The bit‑shifting method offers better extensibility because new tables can be added simply by increasing the suffix range, whereas the MD5 method would need a longer hash prefix to accommodate more tables.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

database shardingPHPhorizontal partitioningMD5 hashingbit shifting
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.