Databases 13 min read

How We Split a 500‑Million‑Row MySQL Table: Process, Pitfalls, and Lessons

Facing a 50‑million‑row financial transaction table that grew 6 million rows per month, the team analyzed the problem, set clear split goals, evaluated sharding middleware, designed a custom pagination algorithm, built a hybrid data‑migration plan, and executed a three‑stage rollout to safely replace the monolithic table with multiple 10‑million‑row shards.

Architect

Apr 17, 2024

How We Split a 500‑Million‑Row MySQL Table: Process, Pitfalls, and Lessons

Background

Two years ago the financial system contained a single MySQL table that stored all transaction flows. The table held more than 50 million rows and grew by over 6 million rows each month, projected to exceed 100 million rows within six months— a size MySQL could no longer maintain.

System State Before Splitting

Frequent time‑outs on transaction‑flow APIs; some interfaces became unusable.

Insert performance degraded dramatically; daily inserts slowed to a crawl.

The table consumed excessive disk space, triggering constant DBA alerts.

Any ALTER operation caused high replication lag and long table locks.

Splitting Goals

Partition the large flow table into multiple shards so that each shard holds roughly 10 million rows (the team’s experience shows MySQL handles this comfortably).

Optimize query conditions for each interface to keep all external and internal APIs usable and eliminate MySQL slow queries.

Key Challenges

The flow table is the core of the financial system; any mistake could break many downstream services.

26 distinct usage scenarios required changes to 32 mapper methods and numerous call sites.

Data volume is huge; migration must keep the system stable.

High‑availability requirements demanded a migration and rollout plan that minimized downtime.

Schema changes forced downstream systems to adapt, requiring coordinated development, testing, and deployment.

Solution Overview

The work was divided into five major parts:

Middleware research and selection.

Choosing the sharding key.

Addressing technical difficulties (multi‑datasource transactions, cross‑table pagination).

Designing a data‑migration strategy.

Planning the overall release process.

1. Sharding Middleware Selection

The team chose sharding‑jdbc as the sharding plugin because it supports multiple sharding strategies, can automatically detect = or IN conditions to route queries, and is lightweight—added as a Maven dependency with minimal intrusion.

2. Sharding Key Decision

After evaluating vertical, horizontal, and fixed‑bucket sharding, the team concluded that horizontal sharding best fits transaction‑flow data because the data cannot be purged. The transaction time field was selected as the sharding key because:

The field always appears in queries (mandatory field).

Sharding by month yields tables of roughly 600 k–700 k rows, well under the 10 million target.

About 70 % of queries already filter by transaction time.

3. Technical Challenges

Multi‑datasource transaction issue: sharding‑jdbc requires an independent datasource per shard, leading to transaction coordination problems. The team solved this by creating a custom annotation and aspect that opens a transaction, then walks the call stack to commit or roll back across all involved datasources.

Cross‑table pagination: After sharding, a single LIMIT clause no longer works because each shard returns a different row count. The team designed a pagination conversion algorithm:

Run a thread per shard to count matching rows.

Accumulate counts in order to build a number line.

Determine which shards contain the first and last rows of the requested page.

For shards fully inside the range, set offset=0 and pageSize=totalCount; for edge shards compute specific offset and pageSize.

The following image illustrates the conversion from a global request (offset=8, pageSize=20) to per‑shard parameters:

4. Data Migration Plan

The team evaluated two approaches: (1) ask the DBA to migrate data, (2) write custom code to migrate data. Considering time cost and impact on the production database, they combined both:

Cold data (transactions older than three months) is migrated incrementally by custom code (“ant‑style” small‑batch moves) to limit load.

Hot data (the most recent three months) is frozen before the final cut‑over; the DBA performs a bulk migration during a short maintenance window.

This hybrid approach limits downtime to roughly 2 hours and allows fine‑grained control of each migration batch.

5. Release Process

The rollout is split into three stages to ensure stability:

Stage 1 – Build shards and enable dual‑write: Create the new sharded tables, migrate existing data, keep both old and new tables writing simultaneously, and route all queries to the sharded tables for observation.

Stage 2 – Switch writes to shards: Stop writing to the old table, redirect internal services to the new sharding‑aware APIs, and continue monitoring.

Stage 3 – Decommission the monolithic table: After confirming no regressions, drop the original large table.

Lessons Learned

Further research on sharding middleware is needed; sharding‑jdbc’s features were under‑utilized for this special use case, and its independent datasource design introduced extra transaction complexity.

Thread‑pool sizing must be analyzed carefully to avoid exhausting CPU cores when many shards are queried in parallel.

When refactoring an existing project, enumerate every business scenario and map it to the corresponding classes and methods to ensure full coverage.

Prepare comprehensive data‑inconsistency handling and rollback plans; consider both time cost and data accuracy.

Design explicit rollback and degradation strategies before deploying complex changes to guarantee system stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend Performance Sharding MySQL Large Tables database migration

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.