How to Slash MySQL Slow Queries on a 100M‑Row Table: Index Tuning and Batch Deletion
The article walks through a real‑world MySQL performance case where a 100‑million‑row table caused SLA alerts, analyzes slow‑query logs, demonstrates index redesign, compares online DDL with pt‑osc, and shows how batch deletions by primary key dramatically reduce delete time and replication lag.
Background
When the author joined a new company, a primary‑replica MySQL instance (one master, one slave) started generating SLA alerts at midnight because the replica lag could become large during master‑to‑slave failover.
Investigation showed that the slow‑query log contained many queries that scanned tens of millions of rows, especially select count(*) from arrival_record … and a daily delete from arrival_record where receive_time < … that each took hundreds of seconds.
Analysis
Using pt‑query‑digest --since=148h mysql‑slow.log the author measured total slow‑query time of 25 403 s in the last week, with the longest query taking 266 s, an average of 5 s per slow query, and an average scanned row count of 17.66 M.
The select arrival_record query scanned up to 56 M rows (average 1.72 M) because the composite index
IXFK_arrival_record(product_id,station_no,sequence,receive_time,arrival_time)could only use its leftmost column product_id, whose cardinality is low, so the optimizer performed a full index scan.
Explain output showed type: ref, rows: 32261320, and Extra: Using index condition; Using where. The show index output confirmed that only one composite index existed and that product_id had a cardinality of 1 344, far too small to be selective.
The author concluded that a separate index on receive_time would let the query use a more selective range scan.
Testing
The table contains about 112 M rows (≈48 GB on disk, 31 GB in InnoDB) and suffers from fragmentation caused by previous large‑scale deletions.
Backup was performed with mydumper (32 parallel threads, 2 M rows per chunk) producing a 1.2 GB compressed dump in 52 s. The dump was copied to a test node and re‑imported with myloader, taking 126 m 42 s.
Two DDL approaches were compared on the test instance: MySQL’s native online DDL and the pt‑osc tool. Online DDL completed in 34 minutes, while pt‑osc took 57 minutes, making online DDL roughly 40 % faster.
Implementation
On the replica the author dropped the original composite index and created a new composite index
idx_product_id_sequence_station_no(product_id,sequence,station_no)plus a single‑column index idx_receive_time(receive_time). The DDL script also removed the foreign key, performed the index changes, and re‑added the foreign key after the operation.
After the change, explain for the same select showed type: range, key: idx_receive_time, and rows reduced to 7.5 M, confirming the index was used.
Index‑Optimized Delete
Even after adding idx_receive_time, the daily delete still took 77 s because it scanned 110 M rows. The author therefore switched to batch deletion by primary key:
# Get the maximum id to delete
SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time < '2019-03-01';
# Delete in small chunks
DELETE FROM arrival_record WHERE id < @need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT(); # returns 20000
# Loop until ROW_COUNT() = 0This approach reduced the impact on the master and eliminated the SLA alerts.
Summary
When a table grows beyond tens of millions of rows, both query latency and maintenance cost (DDL time, delete time) must be considered.
Choose the appropriate DDL method based on table size, foreign‑key constraints, and required downtime.
For massive deletes, use small‑batch primary‑key deletes to lower load and avoid replication lag.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
