MySQL Large Table Index Optimization and Batch Deletion Strategies
By dropping a low‑selectivity composite index, adding targeted indexes on product_id‑sequence‑station_no and receive_time, and using online DDL with pt‑osc, the team reduced row scans and replication lag, then implemented batch deletions by primary‑key ranges, cutting delete time from minutes to seconds on a 100‑million‑row table.
Background : A production MySQL instance (one master, one slave) experienced SLA alerts due to high replication lag caused by heavy slow‑query load on the arrival_record table. The table holds over 100 million rows and has a single composite index IXFK_arrival_record (product_id, station_no, sequence, receive_time, arrival_time). Slow queries on SELECT and DELETE statements scanned millions of rows, leading to long execution times.
Analysis : Using pt-query-digest on the weekly mysql-slow.log revealed that the most expensive queries were:
pt-query-digest --since=148h mysql-slow.log | lessThe SELECT count(*) FROM arrival_record WHERE product_id=26 AND receive_time BETWEEN '2019-03-25 14:00:00' AND '2019-03-25 15:00:00' AND receive_spend_ms>=0 query scanned ~5.6 million rows because the composite index could only use the leftmost column product_id , whose cardinality is low.
EXPLAIN showed:
explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G
---
id: 1
type: ref
possible_keys: IXFK_arrival_record
key: IXFK_arrival_record
rows: 32261320
...Similarly, the daily DELETE FROM arrival_record WHERE receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d') performed a full table scan (≈109 million rows) and took ~262 seconds.
Optimization Plan :
Drop the low‑selectivity composite index IXFK_arrival_record .
Create a new composite index idx_product_id_sequence_station_no(product_id,sequence,station_no) .
Create a single‑column index idx_receive_time(receive_time) to support time‑range queries.
Testing : The table was copied to a test instance. Backup was performed with mydumper (parallel compression) and restored with myloader . After rebuilding the indexes, EXPLAIN confirmed that both SELECT and DELETE statements now use idx_receive_time , reducing scanned rows dramatically.
explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\G
---
type: range
key: idx_receive_time
rows: 7,540,948
...Implementation : Online DDL (or pt‑osc ) was used to apply the index changes without long downtime. Sample online DDL script:
mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "set sql_log_bin=0;\
ALTER TABLE $db.\\`arrival_record\\` DROP FOREIGN KEY FK_arrival_record_product,\
DROP INDEX IXFK_arrival_record,\
ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no),\
ADD INDEX idx_receive_time(receive_time);"After the index change, the DELETE statement still took ~77 seconds because it removed ~3 million rows in one shot. A batch‑delete approach was introduced:
# Get the max id to delete
SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time<'2019-03-01';
# Delete in small batches
DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();This reduces lock time and replication lag; the application was later modified to delete 10‑minute slices (≈1 second per batch), eliminating SLA alerts.
Summary :
Large tables require careful index design; low‑cardinality columns should not dominate composite indexes.
When performing DDL on massive tables, consider online DDL or tools like pt‑osc to minimize downtime.
Batch deletions (by primary‑key range or time window) dramatically lower impact on the primary instance and replication lag.
Java Tech Enthusiast
Sharing computer programming language knowledge, focusing on Java fundamentals, data structures, related tools, Spring Cloud, IntelliJ IDEA... Book giveaways, red‑packet rewards and other perks await!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.