Databases 16 min read

Optimizing Massive MySQL Tables: Index Tuning, Online DDL & Batch Deletion

Facing SLA alerts due to slow queries on a billion‑row MySQL table, the article analyzes slow‑query logs, reveals index misuse, and demonstrates how removing a poor composite index, adding targeted indexes, employing online DDL or pt‑osc, and using batch deletions dramatically reduce query times and maintenance overhead.

Java Backend Technology

Sep 27, 2022

Optimizing Massive MySQL Tables: Index Tuning, Online DDL & Batch Deletion

Background

In a primary‑secondary MySQL instance, SLA alerts were triggered nightly because a large arrival_record table (over 100 million rows) generated slow queries that delayed replication.

Analysis

pt‑query‑digest was used to examine the mysql‑slow.log for the past week.

pt-query-digest --since=148h mysql-slow.log | less

Result part 1 showed a total slow‑query execution time of 25 403 s, the longest query lasting 266 s, an average of 5 s per query, and about 17.6 million rows scanned on average.

Result part 2 displayed the most frequent slow query:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0

This query scanned roughly 32 million rows, averaging 5 s per execution, with a maximum execution time of 266 s.

Execution Plan

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G

The plan used the composite index IXFK_arrival_record, but only the first column ( product_id) was effective; its low selectivity caused the optimizer to scan over 32 million rows.

According to the left‑most prefix rule, the query could not benefit from the remaining indexed columns ( station_no, sequence, receive_time).

Recommended optimizations:

Drop the composite index IXFK_arrival_record.

Create a new composite index

idx_sequence_station_no_product_id(product_id,sequence,station_no)

Create a single‑column index idx_receive_time(receive_time) because receive_time has high cardinality.

Delete Statement

The delete operation

delete from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')

scanned 109 million rows and took about 262 s, as no suitable index existed.

explain select * from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\G

The plan shows a full table scan (type=ALL) with no possible keys.

Testing and Backup

Table statistics:

SELECT COUNT(*) FROM cq_new_cimiss.arrival_record;  -- 112,294,946 rows
-- Disk usage: 48 GB (InnoDB size 31 GB, ~17 GB fragmentation)

A parallel backup was performed with mydumper (32 threads), producing a 1.2 GB compressed dump in 52 s.

mydumper -u root -p xxxx -S /datas/mysql/data/3316/mysqld.sock -B cq_new_cimiss -T arrival_record -o /datas/dump_arrival_record -t 32 -r 2000000

Online DDL vs pt‑osc

Two approaches were used to rebuild indexes on the test instance:

Online DDL : executed an ALTER TABLE that dropped the foreign key, removed the old composite index, added the new indexes, and later re‑added the foreign key. Total time ≈ 34 min.

pt‑osc : performed the same changes using the Percona Toolkit's pt‑osc. Total time ≈ 57 min.

# Example online DDL
ALTER TABLE cq_new_cimiss.arrival_record DROP FOREIGN KEY FK_arrival_record_product,
  DROP INDEX IXFK_arrival_record,
  ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no),
  ADD INDEX idx_receive_time(receive_time);
# pt‑osc command omitted for brevity

Post‑Optimization Results

After adding idx_receive_time, both the select and delete queries use the new index, dramatically reducing the number of scanned rows.

explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\G
-- rows: 7,540,948 (type=range, key=idx_receive_time)

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G
-- rows: 291,448 (type=range, key=idx_receive_time)

Even with the new index, the delete of 300 million rows still required about 77 s.

Batch Deletion Strategy

To avoid long pauses and replication lag, records are deleted in small batches using the primary key range:

# Determine the maximum id to delete
SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time < '2019-03-01';
# Delete in batches of 20 000 rows
DELETE FROM arrival_record WHERE id < @need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();  -- returns number of rows deleted

The loop repeats until ROW_COUNT() returns 0, ensuring the table is cleaned without overwhelming the primary instance.

Conclusion

Large tables need carefully designed indexes; low‑selectivity composite indexes should be avoided.

Online DDL or pt‑osc can safely rebuild indexes with minimal downtime.

Batch deletions reduce impact on the primary server and keep replication lag low.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

performance tuning mysql Batch Deletion Large Tables Online DDL

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.