Databases 27 min read

From Monolith to Sharded MySQL: A Complete End‑to‑End Sharding Case Study

This article walks through a real‑world large‑scale MySQL sharding project, covering business refactoring, storage architecture design, data migration, incremental upgrades, best‑practice tips, and stability safeguards, while sharing concrete steps, pitfalls, and lessons learned from start to production rollout.

ITPUB

Sep 6, 2022

From Monolith to Sharded MySQL: A Complete End‑to‑End Sharding Case Study

1. Introduction

Massive data growth makes a single MySQL instance a bottleneck; CPU, disk, and memory limits force a split‑database‑and‑table (sharding) solution. Two options exist: replace MySQL with a distributed store (e.g., HBase, PolarDB, TiDB) or keep MySQL and apply sharding. This article focuses on the latter, presenting a complete end‑to‑end process.

2. Phase 1 – Business Refactoring (Optional)

For well‑designed micro‑services, only storage changes may be needed; business refactoring is optional.

The project involved two huge tables (A and B) with ~80 million rows each, inherited from a monolithic system, linked to 50+ online services and 20+ offline jobs. Refactoring required merging the tables, removing redundant fields, and ensuring no query was missed.

2.1 Query Statistics

Using a distributed tracing system, the team listed all services that accessed the tables, including offline analytics, to avoid missing any data source.

2.2 Query Migration

A jar named projected‑1.0.0‑SNAPSHOT was created to host all collected queries. Existing mapper calls were replaced with projectdb.xxxMethod(), enabling later replacement with RPC calls.

Facilitates further query analysis.

Allows seamless switch to middle‑tier service calls by upgrading the jar version.

2.3 Joint Query Analysis

The team classified queries to decide which could be split, which required joins, which fields could be merged, redundant, or discarded, and identified sharding keys.

2.4 New Table Design

Based on the analysis, a new unified table schema was drafted, reviewed by all business owners, and indexed according to the new query patterns.

2.5 First Upgrade

Version 2.0.0‑SNAPSHOT of the jar updated all SQL to the new schema. Services were upgraded in batches (core vs. non‑core) to mitigate risk.

2.6 Best Practices

Keep original column names when possible.

Redesign indexes after merging tables; avoid blindly copying old indexes.

3. Phase 2 – Storage Architecture Design (Core)

The core of any sharding project is the storage architecture.

3.1 Overall Architecture

Analysis showed >80% of queries use three primary keys (pk1, pk2, pk3). The design introduced a database middleware, data‑sync tools, and a search platform (OpenSearch/Elasticsearch) for the remaining 20% of queries.

3.1.1 MySQL Sharding

Sharding keys pk1/pk2/pk3 cover most queries. Two full‑copy shards (pk1 and pk3) were maintained due to real‑time requirements; pk2 was stored only in a mapping table.

3.1.2 Search Index

The search index stores primary keys and necessary fields for the remaining queries, avoiding full data duplication unless fields are small.

Search‑engine‑to‑DB sync introduces latency; queries that need strong consistency should hit MySQL directly.

3.1.3 Data Synchronization

Both full‑copy and incremental sync were used. Initially a sync‑only approach was chosen, but real‑time latency forced a switch to dual‑write (snapshot 3.0.0‑SNAPSHOT) for critical paths.

The sync topology includes:

Full sync from old tables to new main tables.

Full sync from new main tables to secondary tables.

Sync from new main tables to mapping tables.

Sync from new main tables to the search engine.

3.2 Capacity Planning

Storage and performance were estimated from QPS and table sizes, but actual sync required ~50% more capacity due to storage gaps.

3.3 Data Validation

Because the migration involved heterogeneous schema changes, a dedicated validation service compared full and incremental data to ensure logical correctness and sync consistency.

3.4 Best Practices

Beware of traffic amplification caused by sharding (secondary index lookups, IN‑batch queries).

When a sharding key must change (e.g., pk3), use delete‑then‑insert with transactional guarantees.

Ensure message order in Kafka‑based sync; for one‑to‑many relationships, fallback to a reverse‑lookup to guarantee consistency.

Account for latency at each layer (sync platform, read‑replica lag, search index delay).

Allocate ~50% extra storage for sync‑induced gaps.

4. Phase 3 – Refactoring and Production Rollout

After architecture is ready, the team performed careful rollout:

Middle‑tier services switched to single‑read, dual‑write mode.

Old tables kept syncing to new tables.

All services upgraded to projectDB‑3.0.0‑SNAPSHOT and exposed RPC endpoints.

Monitoring ensured no non‑mid‑tier service accessed old tables.

Data sync stopped, old tables dropped.

4.1 Query Refactoring

Read queries were rewritten to use new table names and sharding keys; write queries were changed to dual‑write mode with feature flags for quick rollback.

4.2 Service Refactoring

A new middle‑tier service was created to host the refactored queries, allowing independent upgrades and rollbacks.

4.3 Staged Deployment

Non‑core services were upgraded first, followed by core services, to limit impact. Each batch required thorough API risk assessment and regression testing.

4.4 Old Table Decommission

Confirm no external service accesses old tables via monitoring and SQL audit.

Stop data sync.

Delete old tables.

4.5 Additional Best Practices

Immediate reads after writes may miss data due to sync lag; mitigate with dual‑write or fallback reads.

Replace auto‑increment IDs with distributed sequence IDs to avoid key collisions across shards.

Swap ID ranges between old and new tables before cut‑over to prevent conflicts.

5. Stability Assurance

Key stability measures include thorough table‑design reviews, data‑validation services, rapid rollback plans, batch rollouts, comprehensive monitoring/alerting, and extensive unit/integration testing.

6. Cross‑Team Project Management

Successful large‑scale sharding requires disciplined documentation, clear business communication, ownership assignment, and synchronized progress tracking across all involved teams.

7. Outlook

Future improvements may involve newer middleware that automates sharding or adopting distributed databases (PolarDB, TiDB, HBase) that eliminate the need for manual sharding.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

sharding mysql stability

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

1. Introduction

2. Phase 1 – Business Refactoring (Optional)

2.1 Query Statistics

2.2 Query Migration

2.3 Joint Query Analysis

2.4 New Table Design

2.5 First Upgrade

2.6 Best Practices

3. Phase 2 – Storage Architecture Design (Core)

3.1 Overall Architecture

3.1.1 MySQL Sharding

3.1.2 Search Index

3.1.3 Data Synchronization

3.2 Capacity Planning

3.3 Data Validation

3.4 Best Practices

4. Phase 3 – Refactoring and Production Rollout

4.1 Query Refactoring

4.2 Service Refactoring

4.3 Staged Deployment

4.4 Old Table Decommission

4.5 Additional Best Practices

5. Stability Assurance

6. Cross‑Team Project Management

7. Outlook

ITPUB

How this landed with the community

Was this worth your time?

0 Comments

2. Phase 1 – Business Refactoring (Optional)

3. Phase 2 – Storage Architecture Design (Core)

4. Phase 3 – Refactoring and Production Rollout