Choosing and Generating Sharding Keys for Database Partitioning
This article explains how to select appropriate sharding keys based on business requirements and data access patterns, describes vertical and horizontal partitioning methods, and presents common techniques such as equal‑value, index, and gene methods along with key generation strategies like auto‑increment, UUID, Snowflake algorithm, and open‑source middleware.
When a single relational database reaches its limits in load, connections, or concurrency, it is necessary to split the database and its tables to improve performance and scalability. This article discusses vertical and horizontal splitting, the importance of choosing the right sharding key, and various methods for generating sharding keys.
Vertical and Horizontal Splitting
Vertical splitting separates a large database into multiple logical databases, each containing a subset of tables (e.g., user, product, order databases). Within a table, vertical splitting can further divide the table into multiple related tables that store different aspects of the same entity.
Horizontal splitting expands a single table into multiple tables that each contain the same schema but store different rows (e.g., orders are distributed across several tables). The data distribution is usually based on a sharding key.
Choosing the Sharding Key
The sharding key determines how data is distributed across databases/tables. Generally, the dimension most frequently used in queries should be selected as the sharding key (e.g., order number for the order table, product code for the product table). When queries involve non‑sharding fields, three approaches can be used:
Equal‑value method: Align the non‑sharding field value with the sharding key (e.g., make the waybill number identical to the order number).
Index method: Create an index that maps the non‑sharding field to the sharding key, then query via the sharding key.
Gene method: Use a common rule (e.g., low‑bits of a 64‑bit ID) to generate both sharding and non‑sharding identifiers, allowing direct lookup.
Example Tables
Sharding Key
Product Code
Address
Order_id
Sku_code
address
Sharding Key
Order Number
Weight
Waybill_code
Order_id
weight
Generating the Sharding Key
Several common strategies are used to generate sharding keys, chosen according to business volume and concurrency:
Database auto‑increment primary key: Suitable for low‑concurrency scenarios.
UUID: Generated via Java's java.util.UUID ; globally unique but not human‑readable.
Snowflake algorithm: Produces a 64‑bit integer composed of a sign bit, 41‑bit timestamp, 10‑bit machine identifier, and 12‑bit sequence number, allowing up to 4096 IDs per millisecond per machine.
Open‑source middleware: Examples include Meituan's Leaf and Baidu's Uid‑Generator .
Conclusion
Effective sharding requires careful selection of the sharding key based on query patterns, and a suitable generation method that balances uniqueness, readability, and performance. Combining vertical and horizontal partitioning with appropriate key strategies enables databases to scale with growing business demands.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.