Databases 15 min read

How to Optimize Large MySQL Tables: Index Tuning, Online DDL, and Efficient Deletion

This article walks through diagnosing severe SLA alerts caused by slow queries on a massive MySQL table, analyzes query patterns with pt‑query‑digest, redesigns indexes, applies online DDL and pt‑osc for index changes, and demonstrates batch‑delete techniques that dramatically reduce execution time and replication lag.

21CTO
21CTO
21CTO
How to Optimize Large MySQL Tables: Index Tuning, Online DDL, and Efficient Deletion

Background

In a primary‑replica MySQL instance, daily SLA alerts indicated significant master‑slave lag caused by slow queries on the arrival_record table.

Slow queries often exceed 1 s; a nightly job deletes data older than one month.

Analysis

Used pt-query-digest to examine the last week’s mysql-slow.log:

pt-query-digest --since=148h mysql-slow.log | less

Result part 1 shows total slow‑query time of 25 403 s, longest query 266 s, average 5 s, scanning 17.66 million rows per query.

Result part 2

Most frequent slow query:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\\G

Explain plan:

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G

The composite index IXFK_arrival_record is used, but only the first column product_id is applied, scanning ~32 261 320 rows because product_id has low selectivity.

The table holds over 100 million rows and only one composite index; product_id selectivity is poor.

The WHERE clause does not include station_no, so the full composite index cannot be leveraged.

According to the left‑most prefix rule, only product_id is used, causing massive row scans. receive_time has high cardinality; a separate index on this column would improve performance.

Optimization plan: drop IXFK_arrival_record, create composite index

idx_sequence_station_no_product_id(product_id,sequence,station_no)

and single index idx_receive_time(receive_time).

Delete Statement

The delete xxx_record statement scans 110 million rows, taking 262 s.

delete from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\G

Explain shows a full table scan with no usable index.

The delete statement does not use an index, leading to long execution time; adding an index on receive_time resolves this.

Testing

Copied arrival_record to a test instance, performed backup with mydumper, and restored the data.

mydumper -u root -p xxxx -S /datas/mysql/data/3316/mysqld.sock -B cq_new_cimiss -T arrival_record -o /datas/dump_arrival_record -t 32 -r 2000000

Backup completed in 52 s, producing a 1.2 G compressed dump (actual table size 48 G).

time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test -d /datas/dump_arrival_record -t 32

Import took 126 m 42 s; after import the table size matched MySQL’s size (no fragmentation).

Online DDL and pt‑osc

Performed online DDL to drop the foreign key, remove the old composite index, add the new indexes, and then re‑add the foreign key:

ALTER TABLE cq_new_cimiss.arrival_record DROP FOREIGN KEY FK_arrival_record_product, DROP INDEX IXFK_arrival_record, ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no), ADD INDEX idx_receive_time(receive_time);

Online DDL took ~34 minutes, while pt‑osc took ~57 minutes; online DDL was roughly half the time.

Post‑Optimization Queries

Explain now uses idx_receive_time and scans far fewer rows:

explain select count(*) from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\G

Delete still costs ~77 s when using the receive_time index.

Batch Deletion

The application was changed to delete 10 minutes of data per run (≈1 s), eliminating SLA alerts. An alternative approach deletes by primary key in small batches:

SELECT MAX(id) INTO @need_delete_max_id FROM arrival_record WHERE receive_time < '2019-03-01';
DELETE FROM arrival_record WHERE id < @need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();

Loop until ROW_COUNT() returns 0, sleeping briefly between batches.

Summary

When a table grows huge, consider not only query latency but also maintenance costs such as long DDL times and slow deletes.

Select DDL methods (online DDL, pt‑osc) based on table characteristics like foreign keys and partitions.

Use small‑batch deletes to reduce load on the primary instance and avoid replication lag.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningmysqlIndex OptimizationLarge TablesOnline DDLBatch Delete
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.