Databases 16 min read

How to Optimize Massive MySQL Tables: Index Tuning, Online DDL, and Batch Deletion

This article walks through diagnosing severe SLA alerts caused by slow queries on a huge MySQL table, using pt‑query‑digest and tcpdump to uncover index misuse, then proposes new composite and single‑column indexes, demonstrates online DDL versus pt‑osc, and shows how batch deletions and efficient backup/restore dramatically reduce execution time and replication lag.

Java Backend Technology

Apr 5, 2020

How to Optimize Massive MySQL Tables: Index Tuning, Online DDL, and Batch Deletion

Background

A primary‑replica MySQL instance (one master, one slave) raised SLA alerts every night because the arrival_record table experienced significant master‑slave lag during large delete operations.

The table’s slow‑query log showed many queries taking seconds, especially a SELECT that scanned millions of rows.

Analysis

We used pt-query-digest to analyze the last week’s mysql-slow.log:

pt-query-digest --since=148h mysql-slow.log | less

Result part 1 showed a total slow‑query execution time of 25 403 s, the longest query lasting 266 s, an average of 5 s per slow query, and about 17.66 million rows scanned on average.

Result part 2

The most frequent slow query was:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G

It scanned up to 56 million rows, mainly because the composite index

IXFK_arrival_record(product_id,station_no,sequence,receive_time,arrival_time)

was only using the first column product_id, whose cardinality is very low.

Index Diagnosis

According to the left‑most prefix rule, the query could not use the rest of the composite index. The receive_time column has high cardinality, so a separate index on it would be beneficial.

We also examined the table definition: show index from arrival_record; The table has about 100 million rows and only one composite index.

The product_id field has poor selectivity.

Conclusion: drop the existing composite index, create a new composite index

idx_sequence_station_no_product_id(product_id,sequence,station_no)

, and add a single‑column index idx_receive_time(receive_time).

Capturing Queries with tcpdump

We captured SELECT statements directly from the network:

tcpdump -i bond0 -s 0 -l -w - dst port 3316 | strings | grep select | egrep -i 'arrival_record' > /tmp/select_arri.log

Then extracted the WHERE clauses:

IFS_OLD=$IFS
IFS=$'
'
for i in `cat /tmp/select_arri.log`; do echo ${i#*'from'}; done | less
IFS=$IFS_OLD

The captured queries confirmed that many SELECTs filtered on product_id, station_no, and sequence, which could benefit from the new composite index.

Delete Statement Optimization

The original delete statement:

delete from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\G

Its execution plan showed a full table scan ( type: ALL) with over 109 million rows examined, leading to a 262 s execution time.

Optimization: create an index on receive_time so the delete can use a range scan.

Testing on a Clone Instance

We copied the table to a test instance, verified its size (≈48 GB on disk, 31 GB in InnoDB), and noted ~17 GB of fragmentation caused by previous deletes.

Backup was performed with mydumper (parallel compressed dump) and restored with myloader. The dump compressed to 1.2 GB and took about 52 s.

Online DDL vs pt‑osc

We applied the index changes using MySQL’s online DDL:

ALTER TABLE arrival_record DROP FOREIGN KEY FK_arrival_record_product,
DROP INDEX IXFK_arrival_record,
ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no),
ADD INDEX idx_receive_time(receive_time);
ALTER TABLE arrival_record ADD CONSTRAINT _FK_arrival_record_product FOREIGN KEY (product_id) REFERENCES cq_new_cimiss.product(id);

The online DDL took ~34 minutes, while the same operation with pt‑osc took ~57 minutes, making online DDL roughly twice as fast.

Post‑Optimization Execution Plans

After adding idx_receive_time, the SELECT count query now uses a range scan on that index, reducing examined rows from >100 million to ~7.5 million.

explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\G

Result: type: range, key: idx_receive_time, rows: 7 540 948.

The DELETE still scans ~3 million rows (≈77 s) because it now uses the idx_receive_time range.

Batch Deletion Strategy

To avoid long‑running deletes, we switched to deleting in small batches using the primary key:

SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time<'2019-03-01';
DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();  -- repeat until 0

The application was also changed to delete only 10‑minute slices at a time, reducing each delete to ~1 s and eliminating SLA alerts.

Summary

When a table grows to hundreds of millions of rows, monitor not only query latency but also maintenance costs such as DDL time and delete performance.

Choose the appropriate DDL method (online DDL vs pt‑osc) based on table size, foreign keys, and required downtime.

For massive tables, replace full‑table deletes with batch deletions or time‑range deletes to reduce load and replication lag.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

mysql Online DDL Batch Delete Slow query analysis

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.