How Read‑Write Splitting and Columnar Storage Rescued a 100× MySQL Load Spike
A MySQL‑based receipt‑tracking service suffered a sudden 100‑fold load increase, prompting a step‑by‑step optimization that combined read‑write splitting, middleware‑less data routing, columnar storage with Infobright, and dynamic scheduling to dramatically lower CPU/IO pressure and restore performance.
Background
The author, a senior DBA with nearly a decade of experience, maintains a high‑traffic business database that records receipt‑like data (e.g., read counts, comments) without transactional requirements.
Problem
During a traffic surge the primary MySQL instance’s load jumped from a peak of 100% to 100 times the normal level, causing severe write and read contention, especially from full‑table‑scan statistics run every seven minutes.
Initial Read‑Write Splitting
To quickly relieve write pressure, the statistics workload was moved from the primary to a replica. Monitoring showed CPU usage on the primary dropped from >90% to <10% and IO from ~100% to ~25%.
Monitoring Results
Primary node: CPU <10%, IO ~25% after offloading reads.
Replica node: IO remained high due to full‑table scans.
Index Optimization
Adding indexes reduced full‑table scans to index scans but only yielded modest gains; the replica’s IO pressure persisted.
Proposed Architecture – Option 1
Introduce two additional data nodes and a middleware layer for distributed routing. The replica was expanded via a multi‑instance setup on a single server to handle write load.
Risk: middleware becomes a new bottleneck under heavy statistical queries, and IO limits on the replica remain.
Proposed Architecture – Option 2
Implement application‑level data routing: map groups of business services to specific nodes (e.g., services 1‑2 to node A, 3‑5 to node B). This approach is more controllable and scalable.
Columnar Storage with Infobright
To accelerate statistics, a columnar store (Infobright) was added. Tests on tables with ~35 million rows showed query times dropping from minutes to seconds (e.g., a 5‑statement batch reduced from 18 minutes to <3 minutes).
Example table definition:
CREATE TABLE `receipt_12149_428` (
`id` int(11) NOT NULL COMMENT '自增主键',
`userid` int(11) NOT NULL DEFAULT '0' COMMENT '用户ID',
`action` int(11) NOT NULL DEFAULT '0' COMMENT '动作',
`readtimes` int(11) NOT NULL DEFAULT '0' COMMENT '阅读次数',
`create_time` datetime NOT NULL COMMENT '创建时间'
);Export command used for bulk loading:
select * from ${tab_name} where create_time between xxx and xxxx into outfile '/data/dump_data/${tab_name}.csv' FIELDS TERMINATED BY ' ' ENCLOSED BY '"';Dynamic Scheduling Script
A custom script automates full‑ and incremental data sync to Infobright, accepting start and end timestamps. It ensures the primary‑replica lag is minimal before a full dump and then schedules incremental syncs every 10‑30 minutes.
Business Routing for Scaling
Further scaling was achieved by routing traffic at the business layer, allowing seamless addition of new data nodes without disrupting existing services. IO pressure on replicas decreased noticeably after routing.
Final Outcome
The combined optimizations reduced primary CPU load to under 10%, cut replica IO by more than 70%, and brought statistical query latency from half an hour to under two minutes, enabling the system to handle exponential traffic growth.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
