Databases 12 min read

Mastering MySQL Sharding: When and How to Split Databases and Tables

Facing massive user and order data growth, this article examines storage options, compares relational, NoSQL, and NewSQL databases, and provides a detailed guide to implementing MySQL‑based sharding with proxy and client modes, key selection, partition strategies, migration steps, and future scaling considerations.

ITPUB

Jan 2, 2023

Mastering MySQL Sharding: When and How to Split Databases and Tables

Storage Options After Splitting

Background: an e‑commerce system with tens of millions of users and billions of orders, daily growth of over 100,000 users and millions of orders, cannot be served by a single database due to I/O and CPU limits, so sharding is adopted.

Relational Databases

MySQL is the preferred relational database because it offers strong constraints, transaction control, mature SQL syntax, and proven stability, meeting all business requirements.

NoSQL

MongoDB provides sharding and can handle large data volumes, but it lacks strong constraints, transaction support, and complicates migration for critical order data, making it unsuitable for storing orders that require strict consistency.

NewSQL

TiDB represents a newer NewSQL option; its adoption depends on team familiarity and perceived stability, and it is generally used for less critical data in early stages.

MySQL‑Based Sharding

Sharding splits a large table into multiple identical tables (horizontal partition) and a large database into multiple identical databases (vertical partition). The approach chosen relies on minimal third‑party dependencies and flexible business logic.

Two main deployment modes exist:

Proxy mode – business‑transparent; a proxy service handles SQL routing, result merging, etc. Examples include MyCat and ShardingSphere.

Client mode – business‑invasive; the sharding logic resides in the client via a library such as Sharding-JDBC. The architecture diagram is shown below.

Available middleware for sharding is illustrated in the following image.

Implementation Ideas

1. Choosing the Sharding Key

Candidate fields considered: user_id, order_id, order_time, store_id. The final choice is user_id because C‑side user queries dominate the workload, followed by backend city queries and B‑side store statistics.

Key selection criteria include uniform data distribution across shards, minimizing cross‑shard queries, and immutability of the key value.

2. Sharding Strategy

Three common strategies are described:

Range sharding – e.g., each 1,000,000 user IDs form a database, each 100,000 IDs form a table.

Hash modulo – e.g., hash(user_id%8) distributes rows into 8 tables; using powers of two simplifies future expansion.

Hybrid range‑then‑hash – first split by range into databases, then apply hash modulo within each database to create tables.

All three have clear trade‑offs, which are omitted for brevity.

3. Business Code Modifications

Code changes depend heavily on the existing architecture. In microservice environments the impact is limited to the affected service, whereas monolithic applications face greater complexity. Foreign‑key constraints are typically avoided in large‑scale internet systems. Sharding is often combined with query‑separation: data is indexed in Elasticsearch for fast queries, while detailed records may reside in HBase.

4. Historical Data Migration

Migration is time‑consuming; downtime must be minimized. The recommended approach reuses the earlier query‑separation design: existing data is bulk‑migrated, while incremental changes are captured via binlog listeners (Canal) and applied in real time.

The migration workflow includes:

Deploy Canal to capture binlog events and trigger incremental migration.

Test migration scripts and move historical data to the new sharded schema.

Ensure no data loss by synchronizing the time gap between incremental and historical migration.

Validate the new database contains the full dataset before switching traffic.

Roll out the new version of the application, choosing either a gray release or full cut‑over, and prepare a rollback plan.

5. Future Scaling Plans

If data growth exceeds the current sharding design, expansion relies on two principles: using a power‑of‑two number of tables to simplify re‑sharding, and reusing the same migration process described above to move data to new shards.

Drawbacks of Sharding

Incremental migration consistency – ensuring data consistency and high availability during ongoing migration.

Short‑term order spikes – extreme bursts may still overwhelm the sharded system, requiring additional mitigation strategies.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration mysql database partitioning

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.