Databases 11 min read

Splitting a 500‑Million‑Row MySQL Table: Practical Sharding‑JDBC Lessons

Facing a MySQL transaction table exceeding 50 million rows and growing rapidly, a finance team redesigned the schema using sharding‑jdbc, chose transaction time as the shard key, tackled multi‑data‑source transactions and cross‑shard pagination, and executed a phased migration to ensure stability and performance.

dbaplus Community

Jun 11, 2023

Splitting a 500‑Million‑Row MySQL Table: Practical Sharding‑JDBC Lessons

Background

Two years ago the author took over a company's financial system, which contained a massive transaction table of over 50 million rows. The table grew by more than 6 million rows each month and was projected to exceed 100 million rows within six months, making MySQL maintenance impossible.

Pre‑split System Issues

Frequent timeouts on transaction‑related APIs, some interfaces became unusable.

Slow inserts caused daily data‑growth delays.

The table occupied excessive storage, triggering DBA alerts.

Any ALTER operation caused high replication lag and long table locks.

Goals of the Split

Divide the large table into multiple shards, each around 10 million rows (a size MySQL handles comfortably).

Optimize query conditions for each interface to eliminate slow queries and keep all APIs stable.

Challenges

The transaction table is core to the finance system; many downstream systems depend on it, requiring rigorous development, testing, and release processes.

26 business scenarios involve 32 mapper methods, leading to extensive code changes.

Massive data volume demands a stable migration process.

High‑traffic, critical functionality requires minimal downtime and robust rollback, downgrade, and migration strategies.

Schema changes affect downstream systems, necessitating coordinated refactoring across teams.

Overall Process

Implementation Details

1. Sharding Middleware Selection

Sharding‑JDBC was chosen as the sharding plugin because it supports multiple sharding strategies, automatically detects "=" or "IN" conditions, and is a lightweight Maven dependency with low intrusion.

An attempt to use Elasticsearch for query acceleration was abandoned after the provided ES service proved unsuitable for the business scenario. Instead, the team adopted a per‑table query‑thread model to improve query speed.

2. Choosing the Sharding Key

Horizontal sharding was the only viable method for immutable financial transaction data. The "transaction time" field was selected as the shard key because it appears in about 70% of queries, distributes data evenly when split by month, and each month yields roughly 6‑7 million rows.

3. Technical Difficulties

Multi‑DataSource Transaction Issue : Sharding‑JDBC requires an independent data source, leading to transaction problems across multiple data sources. The team solved this with custom annotations and AOP‑based transaction management (code omitted for confidentiality).

Cross‑Table Pagination : Traditional LIMIT no longer works when data is spread across shards. The team designed a pagination algorithm that first queries each shard for matching row counts, then converts the global offset/pageSize into per‑shard offset/pageSize. The diagram below illustrates the conversion.

Query each shard in parallel to obtain the number of matching rows.

Accumulate counts to locate the first and last rows of the requested page.

Set offset/pageSize for the first and last shards accordingly; other shards use offset 0 and pageSize equal to their total count.

Calculate offset and pageSize for the first shard; set offset 0 for the last shard.

4. Data Migration Strategy

Two options were evaluated: DBA‑driven migration vs. custom code migration. The final approach combined both: cold data (older than three months) was migrated via controlled incremental code "ant‑migration", while hot data (last three months) required a brief write‑stop window and DBA bulk migration.

5. Deployment Phases

Phase 1: Create shards, migrate historical data, enable dual‑write (old and new tables), route all queries to shards.

Phase 2: Stop writes to the old table, switch external interfaces to the new sharded tables.

Phase 3: Decommission the original large table.

Takeaways

Further research on sharding middleware is needed; many Sharding‑JDBC features were under‑utilized due to the specific sharding key.

Multi‑data‑source transactions added development overhead.

Thread‑pool sizing must be carefully analyzed to avoid exhausting CPU resources.

Comprehensive scenario mapping is essential when refactoring an existing project.

Robust migration, rollback, and downgrade plans are critical for stability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration mysql database sharding Large Tables Sharding-JDBC

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.