How to Optimize Large MySQL Tables: Index Tuning, Online DDL, and Efficient Deletion
This article walks through diagnosing severe SLA alerts caused by slow queries on a massive MySQL table, analyzes query patterns with pt‑query‑digest, redesigns indexes, applies online DDL and pt‑osc for index changes, and demonstrates batch‑delete techniques that dramatically reduce execution time and replication lag.
Background
In a primary‑replica MySQL instance, daily SLA alerts indicated significant master‑slave lag caused by slow queries on the arrival_record table.
Slow queries often exceed 1 s; a nightly job deletes data older than one month.
Analysis
Used pt-query-digest to examine the last week’s mysql-slow.log:
pt-query-digest --since=148h mysql-slow.log | lessResult part 1 shows total slow‑query time of 25 403 s, longest query 266 s, average 5 s, scanning 17.66 million rows per query.
Result part 2
Most frequent slow query:
select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\\G
Explain plan:
explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\GThe composite index IXFK_arrival_record is used, but only the first column product_id is applied, scanning ~32 261 320 rows because product_id has low selectivity.
The table holds over 100 million rows and only one composite index; product_id selectivity is poor.
The WHERE clause does not include station_no, so the full composite index cannot be leveraged.
According to the left‑most prefix rule, only product_id is used, causing massive row scans. receive_time has high cardinality; a separate index on this column would improve performance.
Optimization plan: drop IXFK_arrival_record, create composite index
idx_sequence_station_no_product_id(product_id,sequence,station_no)and single index idx_receive_time(receive_time).
Delete Statement
The delete xxx_record statement scans 110 million rows, taking 262 s.
delete from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\GExplain shows a full table scan with no usable index.
The delete statement does not use an index, leading to long execution time; adding an index on receive_time resolves this.
Testing
Copied arrival_record to a test instance, performed backup with mydumper, and restored the data.
mydumper -u root -p xxxx -S /datas/mysql/data/3316/mysqld.sock -B cq_new_cimiss -T arrival_record -o /datas/dump_arrival_record -t 32 -r 2000000Backup completed in 52 s, producing a 1.2 G compressed dump (actual table size 48 G).
time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test -d /datas/dump_arrival_record -t 32Import took 126 m 42 s; after import the table size matched MySQL’s size (no fragmentation).
Online DDL and pt‑osc
Performed online DDL to drop the foreign key, remove the old composite index, add the new indexes, and then re‑add the foreign key:
ALTER TABLE cq_new_cimiss.arrival_record DROP FOREIGN KEY FK_arrival_record_product, DROP INDEX IXFK_arrival_record, ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no), ADD INDEX idx_receive_time(receive_time);Online DDL took ~34 minutes, while pt‑osc took ~57 minutes; online DDL was roughly half the time.
Post‑Optimization Queries
Explain now uses idx_receive_time and scans far fewer rows:
explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\GDelete still costs ~77 s when using the receive_time index.
Batch Deletion
The application was changed to delete 10 minutes of data per run (≈1 s), eliminating SLA alerts. An alternative approach deletes by primary key in small batches:
SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time < '2019-03-01';
DELETE FROM arrival_record WHERE id < @need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();Loop until ROW_COUNT() returns 0, sleeping briefly between batches.
Summary
When a table grows huge, consider not only query latency but also maintenance costs such as long DDL times and slow deletes.
Select DDL methods (online DDL, pt‑osc) based on table characteristics like foreign keys and partitions.
Use small‑batch deletes to reduce load on the primary instance and avoid replication lag.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
