Database Sharding: Storage Options, Implementation Strategies, and Migration Considerations
This article explains how to choose storage options, design sharding keys, apply partitioning strategies, modify business code, handle historical data migration, and plan future scaling for large‑scale e‑commerce systems using MySQL, NoSQL, and NewSQL solutions.
Storage Options After Partitioning
Before discussing the selection, the author describes the background of an e‑commerce system with two main entities: User (tens of millions of records, daily growth of over 100k) and Order (hundreds of millions of records, daily growth of millions). A single database and table cannot handle such I/O and CPU load, so sharding is necessary.
The author evaluated several storage technologies and finally chose sharding based on the following categories:
Relational databases: MySQL, Oracle, SqlServer (focus on MySQL).
NoSQL: MongoDB, Elasticsearch.
NewSQL: TiDB.
1. Relational Databases
MySQL is widely used by modern enterprises; it provides strong constraints, transaction control, SQL syntax, and locking, which satisfy most business requirements.
2. NoSQL
MongoDB offers sharding and can handle large data volumes, but it lacks strong constraints, transaction support, and may require extensive code changes when replacing relational storage.
3. NewSQL
TiDB is a newer technology that can store massive data, but its stability and team familiarity must be considered before adoption.
Sharding Based on MySQL
Sharding (or partitioning) splits a large table into multiple identical tables and a large database into multiple identical databases.
Two main sharding modes are used:
Proxy mode : Business‑transparent, all routing and result merging happen in a proxy service (e.g., MyCat , ShardingSphere ).
Client mode : Business code includes the sharding logic via a client library (e.g., Sharding‑JDBC), requiring a JAR dependency.
Both modes have clear pros and cons: Proxy mode decouples resources but raises operational cost; Client mode offers flexible control with lower operational cost but introduces language constraints and upgrade difficulty.
Implementation Ideas for Sharding
When implementing sharding, five key points must be addressed:
1. Choosing the Sharding Key
For an order system, candidate fields include user_id , order_id , order_time , and store_id . The author selects user_id because C‑end user queries dominate the workload.
2. Sharding Strategy
Three common strategies are described:
Range sharding : Split by numeric ranges (e.g., every 1 million users per database, every 100k per table).
Hash modulo : Apply a hash function such as hash(user_id%8) to distribute data across 8 tables (prefer powers of two for future expansion).
Hybrid : First range‑split into databases, then hash‑split within each database.
3. Modifying Business Code
Sharding is easier in micro‑service architectures where only the affected service needs changes; in monoliths it is more complex. Foreign key constraints are usually avoided, and query‑separation patterns often combine sharding with an Elasticsearch index for fast reads.
4. Migrating Historical Data
Data migration is time‑consuming and must be seamless. The recommended approach mirrors the query‑separation solution: migrate existing data in bulk, capture incremental changes via binlog and Canal, verify consistency, then switch traffic gradually.
5. Future Scaling Plans
When scaling further, ensure the sharding strategy allows migration from a single old shard to a new one (hence the preference for power‑of‑two hash). Data migration follows the same process as historical migration.
Limitations of Sharding
Despite its benefits, sharding still faces challenges such as ensuring consistency during incremental migration and handling sudden spikes in order volume.
For more detailed discussions, the author invites readers to join his knowledge community, where additional articles, videos, and source code are shared.
Code Ape Tech Column
Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.