Databases 12 min read

Database Sharding: Storage Options, Implementation Strategies, and Migration Considerations

This article explains how to choose storage options, design sharding keys, apply partitioning strategies, modify business code, handle historical data migration, and plan future scaling for large‑scale e‑commerce systems using MySQL, NoSQL, and NewSQL solutions.

Code Ape Tech Column

Oct 20, 2022

Database Sharding: Storage Options, Implementation Strategies, and Migration Considerations

Storage Options After Partitioning

Before discussing the selection, the author describes the background of an e‑commerce system with two main entities: User (tens of millions of records, daily growth of over 100k) and Order (hundreds of millions of records, daily growth of millions). A single database and table cannot handle such I/O and CPU load, so sharding is necessary.

The author evaluated several storage technologies and finally chose sharding based on the following categories:

Relational databases: MySQL, Oracle, SqlServer (focus on MySQL).

NoSQL: MongoDB, Elasticsearch.

NewSQL: TiDB.

1. Relational Databases

MySQL is widely used by modern enterprises; it provides strong constraints, transaction control, SQL syntax, and locking, which satisfy most business requirements.

2. NoSQL

MongoDB offers sharding and can handle large data volumes, but it lacks strong constraints, transaction support, and may require extensive code changes when replacing relational storage.

3. NewSQL

TiDB is a newer technology that can store massive data, but its stability and team familiarity must be considered before adoption.

Sharding Based on MySQL

Sharding (or partitioning) splits a large table into multiple identical tables and a large database into multiple identical databases.

Two main sharding modes are used:

Proxy mode : Business‑transparent, all routing and result merging happen in a proxy service (e.g., MyCat, ShardingSphere).

Client mode : Business code includes the sharding logic via a client library (e.g., Sharding‑JDBC), requiring a JAR dependency.

Both modes have clear pros and cons: Proxy mode decouples resources but raises operational cost; Client mode offers flexible control with lower operational cost but introduces language constraints and upgrade difficulty.

Implementation Ideas for Sharding

When implementing sharding, five key points must be addressed:

1. Choosing the Sharding Key

For an order system, candidate fields include user_id, order_id, order_time, and store_id. The author selects user_id because C‑end user queries dominate the workload.

2. Sharding Strategy

Three common strategies are described:

Range sharding : Split by numeric ranges (e.g., every 1 million users per database, every 100k per table).

Hash modulo : Apply a hash function such as hash(user_id%8) to distribute data across 8 tables (prefer powers of two for future expansion).

Hybrid : First range‑split into databases, then hash‑split within each database.

3. Modifying Business Code

Sharding is easier in micro‑service architectures where only the affected service needs changes; in monoliths it is more complex. Foreign key constraints are usually avoided, and query‑separation patterns often combine sharding with an Elasticsearch index for fast reads.

4. Migrating Historical Data

Data migration is time‑consuming and must be seamless. The recommended approach mirrors the query‑separation solution: migrate existing data in bulk, capture incremental changes via binlog and Canal, verify consistency, then switch traffic gradually.

5. Future Scaling Plans

When scaling further, ensure the sharding strategy allows migration from a single old shard to a new one (hence the preference for power‑of‑two hash). Data migration follows the same process as historical migration.

Limitations of Sharding

Despite its benefits, sharding still faces challenges such as ensuring consistency during incremental migration and handling sudden spikes in order volume.

For more detailed discussions, the author invites readers to join his knowledge community, where additional articles, videos, and source code are shared.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql NewSQL NoSQL

Written by

Code Ape Tech Column

Former Ant Group P8 engineer, pure technologist, sharing full‑stack Java, job interview and career advice through a column. Site: java-family.cn

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.