How We Split a 500M‑Row MySQL Table: Lessons and Strategies
Facing a 50‑million‑row financial transaction table that grew 600,000 rows each month, the team designed a sharding solution using sharding‑jdbc, tackled multi‑datasource transaction and pagination challenges, and executed a staged migration that kept the system stable while eliminating MySQL performance bottlenecks.
Preface
The author took over a company's financial system two years ago and discovered a massive transaction flow table exceeding 50 million rows. The table grew by more than 600,000 rows each month, meaning it would surpass 100 million rows within six months—an unsustainable size for MySQL.
System State Before Sharding
Frequent time‑outs on flow‑related APIs, some interfaces became unusable.
Insert operations slowed dramatically as daily new rows accumulated.
The single table occupied excessive disk space, triggering DBA alerts.
Any ALTER operation caused high replication latency and long table locks.
Sharding Goals
Split the large flow table into multiple shards, keeping each shard around 10 million rows (a safe size for MySQL).
Optimize query conditions for each interface to ensure availability and eliminate slow queries.
Difficulty Analysis
The table is the core of the finance system; many downstream services depend on it, requiring rigorous development, testing, and release processes.
26 business scenarios demanded changes to 32 mapper methods, with countless additional code adjustments.
Data migration had to preserve system stability.
Coordinating multiple teams for interface changes and rollout added significant complexity.
Overall Process
Detailed Implementation
Sharding Middleware Research
After evaluating several options, the team selected sharding‑jdbc as the sharding plugin.
Supports multiple sharding strategies and automatically resolves = or IN conditions to the correct shard.
Lightweight Maven dependency with minimal intrusion to existing code.
The team also considered Elasticsearch to accelerate queries but abandoned it because the provided ES service did not match their business scenario.
Choosing the Sharding Key
Horizontal sharding based on transaction_time was chosen for three reasons:
The field appears in almost every query, reducing the amount of code refactoring.
Sharding by month yields roughly 600‑700 k rows per shard, keeping each shard within the 10 million‑row target.
About 70 % of queries already filter by transaction time.
Technical Challenges
Multi‑DataSource Transaction Issue
sharding‑jdbc requires an independent data source, leading to transaction problems across multiple data sources. The team solved this by creating a custom annotation and AOP‑based transaction interceptor that propagates commits or rollbacks through the call stack.
Pagination Across Shards
Global pagination (offset = 8, pageSize = 20) must be translated into per‑shard pagination parameters because each shard returns a different number of rows. The algorithm:
Query each shard in parallel to obtain the count of matching rows.
Accumulate counts in order to build a cumulative axis.
Identify the shards containing the first and last rows of the requested page.
For shards between those two, set offset=0 and pageSize=totalCount.
Calculate precise offset and pageSize for the first and last shards.
Data Migration Plan
The team evaluated two approaches: DBA‑driven migration and custom code migration. They combined both:
Cold data (transactions older than three months) were migrated by custom code in small, frequent batches (“ant‑style” migration).
Hot data (transactions within the last three months) required a brief write‑stop window; DBA performed a bulk migration, limiting downtime to about two hours.
Only the final DBA migration was allowed to move large volumes, preventing high‑latency spikes on the primary instance.
Overall Deployment Process
Stage 1: Create sharded tables, migrate data, enable dual‑write (old and new tables), route all queries to the shards (validation phase).
Stage 2: Stop writes to the old table, switch external interfaces to read from the new sharded tables (validation phase).
Stage 3: Decommission the original large table.
Conclusion
Further research on sharding middleware is needed; sharding‑jdbc’s independent data source caused extra transaction overhead.
Thread‑pool sizing must be carefully analyzed to avoid exhausting CPU resources.
All business scenarios should be enumerated and mapped to code locations before refactoring.
Robust data‑migration, rollback, and degradation plans are essential for stability.
Effective communication and coordination among teams are as critical as technical skills for backend engineers.
Side Note
The author reflects on the importance of soft skills—especially communication—for backend developers, who must balance business understanding, technical depth, and project management.
Architect's Guide
Dedicated to sharing programmer-architect skills—Java backend, system, microservice, and distributed architectures—to help you become a senior architect.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
