Splitting a Massive MySQL Financial Transaction Table: Challenges, Sharding Strategy, and Migration Process
This article describes how a finance team tackled a 500 million‑row MySQL transaction table by analyzing pre‑split issues, defining sharding goals, selecting sharding‑jdbc, addressing multi‑datasource transaction and pagination challenges, designing a hybrid data‑migration plan, and executing a three‑stage rollout to ensure system stability and performance.
The author took over a company's financial system and discovered a single transaction table exceeding 500 million rows and growing by 600 k rows per month, causing timeouts, slow inserts, large storage usage, and locking issues.
Pre‑split system state:
Frequent interface timeouts related to the transaction table.
Very slow daily inserts.
Table occupied excessive disk space, triggering DBA alerts.
Any ALTER operation caused high replication latency and long table locks.
Split goals:
Divide the large table into multiple shards, each around 10 million rows (a comfortable size for MySQL).
Optimize query conditions for each interface to eliminate slow queries and maintain availability.
Middleware research: The team evaluated sharding‑jdbc, noting its support for multiple sharding strategies, lightweight Maven integration, and low intrusion. An initial plan to use Elasticsearch for faster queries was abandoned after compatibility tests.
Sharding basis selection: After analyzing 26 usage scenarios and 32 mapper methods, the team chose horizontal sharding based on the "transaction time" field because it appears in 70 % of queries, distributes data evenly (≈600‑700 k rows per month), and is always present.
Technical challenges:
Multi‑datasource transaction issue: sharding‑jdbc requires an independent datasource, leading to transaction coordination problems. The team solved this with custom annotations and AOP‑based transaction management (code omitted for confidentiality).
Cross‑table pagination: Traditional LIMIT no longer works across shards. The solution involves calculating per‑shard offsets and page sizes based on the global offset/pageSize, using a multi‑threaded query per shard.
Data migration plan: Two approaches were considered – DBA‑driven migration and custom code migration. The final hybrid strategy migrates "cold" data (older than three months) via controlled batch scripts, while "hot" data (last three months) is migrated by the DBA after a brief write‑stop window before go‑live.
Overall rollout process (three stages):
Stage 1 – Create shards, migrate historical data, enable dual‑write (old and new tables) and route all queries to shards for validation.
Stage 2 – Stop writes to the old table, switch business services to the new sharded tables, and continue monitoring.
Stage 3 – Decommission the original large table.
Summary:
Further research on sharding middleware is needed; sharding‑jdbc’s features were under‑utilized and its independent datasource introduced extra transaction complexity.
Thread‑pool sizing must be carefully tuned to avoid exhausting server threads.
Comprehensive scenario mapping is essential when refactoring an existing project.
Data‑migration plans must include consistency checks and rollback strategies.
Robust rollback and degradation measures are critical for complex releases.
Additionally, the author reflects on the importance of communication and soft skills for backend engineers, who must balance business understanding, technical depth, and coordination across teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
