How to Handle Billion‑Row MySQL Tables: Performance Tests, Indexing, and Sharding Strategies
This article explores MySQL performance when storing and querying billions of financial transaction records, detailing table creation, massive data generation, the impact of indexes on query and insert speed, pagination limits, and a practical time‑based sharding approach for bank‑level cash‑flow logs.
The author investigates MySQL performance and handling of massive (up to 100 million rows) cash‑flow data, a typical scenario for bank transaction logs, and shares practical experiments, pitfalls, and architectural recommendations.
Table definition and data generation
A cash‑flow table yun_cashflow is created with eight columns (id, userid, type, operatoruserid, withdrawdepositid, money, runid, createtime) and an InnoDB engine.
DROP TABLE IF EXISTS `yun_cashflow`;
CREATE TABLE `yun_cashflow` (
`id` bigint(20) NOT NULL AUTO_INCREMENT,
`userid` int(11) DEFAULT NULL,
`type` int(11) DEFAULT NULL COMMENT '1、入账,2提现',
`operatoruserid` int(11) DEFAULT NULL COMMENT '操作员ID',
`withdrawdepositid` bigint(20) DEFAULT NULL COMMENT '提现ID',
`money` double DEFAULT NULL COMMENT '钱数',
`runid` bigint(20) DEFAULT NULL COMMENT '工单ID',
`createtime` timestamp NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=63 DEFAULT CHARSET=utf8;To populate the table, a stored procedure test_insert inserts 10 000 rows per call using a loop and random values. The initial approach proved very slow (≈40‑60 rows/s), prompting a bulk‑insert technique that concatenates many VALUES rows in a single statement, dramatically increasing insert speed.
Data size and baseline query performance
Inserting 100 million rows results in a data file of about 7.78 GB. Without any indexes, simple SELECT queries (time range, user filter, money filter, or their combinations) take roughly 1‑2 minutes, confirming that full‑table scans on such a dataset are impractical.
Adding indexes
Indexes are added on userid, money, and createtime:
ALTER TABLE yun_cashflow ADD INDEX index_userid (userid);
ALTER TABLE yun_cashflow ADD INDEX index_money (money);
ALTER TABLE yun_cashflow ADD INDEX index_createtime (createtime);Creating these indexes on the 7.8 GB table takes about 1 400 seconds (≈23 minutes). After indexing, queries that return a small result set (e.g., a few dozen rows) execute in sub‑second time. However, when the result set grows beyond a few million rows, query time rises proportionally (e.g., 1 000 000 rows ≈ 1 second, 10 000 000 rows ≈ several seconds).
Pagination impact
Using LIMIT offset, count shows that performance degrades as the offset increases: low offsets (<1 000) are fast, but offsets in the millions become noticeably slower, making deep pagination unsuitable for very large tables.
Insert speed with indexes
With indexes in place, insert speed drops to about 20 rows per second, confirming that maintaining indexes during heavy writes can be a bottleneck.
Sharding strategy
To mitigate both read and write bottlenecks, the author proposes time‑based sharding: split the massive table into multiple tables (e.g., one million rows per table) and maintain a routing table CreateTimeIndexTable that maps time ranges to the appropriate shard.
CreateTimeIndexTable
ID TableName CreateTimeStart CreateTimeEnd
1 yun_cashflow_1 2018-10-22 09:06:58 2018-10-26 09:06:58
2 yun_cashflow_2 2018-10-26 09:06:58 2018-10-29 09:06:58
3 yun_cashflow_3 2018-11-12 09:06:58 2018-11-22 09:06:58
4 yun_cashflow_4 2018-11-22 09:06:58 2018-11-26 09:06:58Query routing works in two steps: first query the index table to find the relevant shard(s), then query the actual shard(s). If multiple shards match, results are merged in the application layer.
Multi‑threaded query considerations
Running many concurrent connections (e.g., 100 threads) on a single MySQL instance quickly exhausts connection limits, leading to timeouts. The author suggests using asynchronous coordination mechanisms (e.g., Java CountDownLatch or Akka) to aggregate results from parallel shard queries.
Practical recommendations for bank transaction logs
Separate write‑heavy operation tables from read‑heavy historical tables; keep indexes only on the historical table.
Periodically move older data from the operation table to the appropriate time‑sharded historical tables (e.g., nightly batch).
Index the most selective fields (userid, createtime, money) to accelerate common queries.
Limit query time ranges (e.g., three months to one year) to keep result sets manageable.
Use pagination with reasonable offsets; avoid deep offsets on a single massive table.
Consider a sharding scheme of ~1 million rows per shard to keep index rebuild times short.
Overall, the experiments demonstrate that raw MySQL can handle hundred‑million‑row tables, but careful indexing, sharding, and query design are essential to achieve acceptable performance for banking transaction workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
