How to Horizontally Partition Tables Using MD5 Hashing and Bit Shifting
This article explains two practical horizontal sharding techniques for large‑scale projects—MD5‑based hashing and bit‑shifting of user IDs—detailing their PHP implementations, scalability limits, and how to extend table counts as data grows.
In medium and large projects, horizontal partitioning of databases or tables is often used to reduce the load on a single database or table. The article presents two common sharding methods that route data to specific tables based on the user’s UID, which is a unique auto‑incrementing identifier.
Method 1: MD5 Hashing
The UID is hashed with MD5 and the first two characters of the hash are used as a suffix, directing the record to a table named user_xx. This distributes users across 256 tables (from user_00 to user_ff), achieving near‑uniform distribution.
function getTable($uid) {
$ext = substr(md5($uid), 0, 2);
return "user_" . $ext;
}While this method spreads data evenly, it may become insufficient if the number of users grows beyond the capacity of 256 tables, requiring a longer hash prefix.
Method 2: Bit Shifting
This approach right‑shifts the UID by a configurable number of bits (commonly 20) and formats the result as a zero‑padded suffix. Each suffix corresponds to a separate table, e.g., user_0000, user_0001, etc.
public function getTable($uid) {
return "user_" . sprintf("%04d", ($uid >> 20));
}With a four‑digit suffix, up to 10,000 tables can be created, each holding about one million rows, allowing storage of up to 100 billion users. The suffix length can be increased (e.g., six digits) to support even larger scales.
A more flexible version lets you specify the suffix length and shift amount:
/**
* Get table name based on UID sharding algorithm
* @param int $uid // User ID
* @param int $bit // Number of digits to keep in suffix
* @param int $seed // Number of bits to shift right
*/
function getTable($uid, $bit, $seed) {
return "user_" . sprintf("%0{$bit}d", ($uid >> $seed));
}Both methods require an upfront estimate of the maximum data volume per table. The bit‑shifting method offers better extensibility because new tables can be added simply by increasing the suffix range, whereas the MD5 method would need a longer hash prefix to accommodate more tables.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
