Databases 17 min read

How to Optimize Massive MySQL Tables: Index Tuning, Online DDL, and Batch Deletion Strategies

This article analyzes a large MySQL table with frequent SLA alerts, demonstrates how to identify slow queries using pt‑query‑digest, proposes index redesign and online DDL versus pt‑osc methods, and shows testing, implementation, and batch‑delete techniques to dramatically reduce query time and maintenance overhead.

Java High-Performance Architecture
Java High-Performance Architecture
Java High-Performance Architecture
How to Optimize Massive MySQL Tables: Index Tuning, Online DDL, and Batch Deletion Strategies

Background

In a master‑slave MySQL instance, daily SLA alerts indicate master‑slave latency caused by slow queries on the arrival_record table, which stores over 100 million rows.

Analysis

Using pt-query-digest to analyze the recent mysql-slow.log:

pt-query-digest --since=148h mysql-slow.log | less

Result part 1 shows a total slow‑query execution time of 25,403 s, the longest query lasting 266 s, an average of 5 s per query, and about 17.66 million rows scanned on average.

Result part 2 reveals that the select arrival_record statement accounts for more than 40 k slow queries with an average response time of 4 s, while the corresponding delete arrival_record appears only 6 times but takes an average of 258 s.

select xxx_record statement

The slow select arrival_record queries share the same WHERE clause with different parameter values:

select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G

The query scans up to 56 million rows, averaging 1.72 million rows, indicating that the large row scan is the main cause of the long execution time.

Execution plan:

explain select count(*) from arrival_record where product_id=26 and receive_time between '2019-03-25 14:00:00' and '2019-03-25 15:00:00' and receive_spend_ms>=0\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
type: ref
possible_keys: IXFK_arrival_record
key: IXFK_arrival_record
key_len: 8
ref: const
rows: 32261320
filtered: 3.70
Extra: Using index condition; Using where
1 row in set, 1 warning (0.00 sec)

The composite index IXFK_arrival_record is used, but because only the first column product_id (low selectivity) is matched, MySQL scans millions of rows.

Key observations:

The table has about 108 million rows and only one composite index; product_id has low cardinality.

The WHERE clause lacks station_no, so the composite index cannot be fully utilized.

According to the left‑most prefix rule, only product_id is used, leading to massive row scans.

The receive_time column has high cardinality and would benefit from a dedicated index.

Optimization recommendations:

Drop the composite index IXFK_arrival_record.

Create a new composite index idx_sequence_station_no_product_id on (product_id, sequence, station_no).

Create a single‑column index indx_receive_time on receive_time.

delete xxx_record statement

The delete operation scans about 110 million rows and takes ~262 s. Example statement:

delete from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\G

Execution plan shows a full table scan:

explain select * from arrival_record where receive_time < STR_TO_DATE('2019-02-23','%Y-%m-%d')\G;
*************************** 1. row ***************************
id: 1
select_type: SIMPLE
table: arrival_record
type: ALL
possible_keys: NULL
key: NULL
rows: 109501508
filtered: 33.33
Extra: Using where

Optimization: add an index on receive_time.

Testing

Copy arrival_record to a test instance and perform delete‑and‑reindex operations.

Table size information:

du -sh /datas/mysql/data/3316/cq_new_cimiss/arrival_record*
12K    arrival_record.frm
48G    arrival_record.ibd

Backup with mydumper (parallel compression):

user=root
passwd=xxxx
socket=/datas/mysql/data/3316/mysqld.sock
db=cq_new_cimiss
table_name=arrival_record
backupdir=/datas/dump_$table_name
mkdir -p $backupdir
nohup echo `date +%T` && mydumper -u $user -p $passwd -S $socket -B $db -c -T $table_name -o $backupdir -t 32 -r 2000000 && echo `date +%T` &

Backup took 52 s and produced a 1.2 G compressed dump (original disk usage ~48 G).

Import with myloader (32 threads):

time myloader -u root -S /datas/mysql/data/3308/mysqld.sock -P 3308 -p root -B test -d /datas/dump_arrival_record -t 32
real 126m42.885s

After import, the table occupies 30 G with no fragmentation.

Use both online DDL and pt‑osc to drop and recreate indexes. Online DDL took 34 min, pt‑osc 57 min (online DDL roughly half the time).

Implementation

Because the environment is a master‑slave pair, the online DDL is executed on the replica after stopping replication:

function red_echo() {
    local what="$*"
    echo -e "$(date +%F-%T)  ${what}"
}
function check_last_comm() {
    if [ "$1" != "0" ]; then
        red_echo "$2"
        echo "exit 1"
        exit 1
    fi
}
red_echo "stop slave"
mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "stop slave"
check_last_comm "$?" "stop slave failed"
red_echo "online ddl begin"
mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "set sql_log_bin=0;select now() as ddl_start;ALTER TABLE $db_.\`$table_name\` DROP FOREIGN KEY FK_arrival_record_product, DROP INDEX IXFK_arrival_record, ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no), ADD INDEX idx_receive_time(receive_time); select now() as ddl_stop" >>${log_file} 2>&1
red_echo "online ddl stop"
red_echo "add foreign key"
mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "set sql_log_bin=0;ALTER TABLE $db_.${table_name} ADD CONSTRAINT _FK_${table_name}_product FOREIGN KEY (product_id) REFERENCES cq_new_cimiss.product (id) ON DELETE NO ACTION ON UPDATE NO ACTION;" >>${log_file} 2>&1
check_last_comm "$?" "add foreign key error"
red_echo "add foreign key stop"
red_echo "start slave"
mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "start slave"
check_last_comm "$?" "start slave failed"

The whole DDL process took 28 min; adding the foreign key took another 48 min.

Post‑optimization

After adding idx_receive_time, the delete still needs 77 s to remove 300 k rows:

delete from arrival_record where receive_time < STR_TO_DATE('2019-03-10','%Y-%m-%d')\G

Batch deletion for large tables

The application was changed to delete only 10 minutes of data per run (≈1 s), eliminating SLA alerts:

Another approach deletes by primary key in small batches of 20 000 rows:

# Get the maximum id that satisfies the time condition
SELECT MAX(id) INTO @need_delete_max_id FROM `arrival_record` WHERE receive_time<'2019-03-01';
# Delete in batches
DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();  # returns 20000
# Loop until ROW_COUNT() = 0
DELETE FROM arrival_record WHERE id<@need_delete_max_id LIMIT 20000;
SELECT ROW_COUNT();
# Sleep 0.5s between iterations

Conclusion

When a table grows to hundreds of millions of rows, both query latency and maintenance cost (long DDL, heavy deletes) must be considered.

Choosing the proper DDL method depends on table specifics such as foreign keys and partitioning.

For massive tables, performing deletes in small batches greatly reduces load on the primary instance and mitigates replication lag.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performance tuningmysqlBatch DeletionIndex OptimizationOnline DDLDatabase Maintenance
Java High-Performance Architecture
Written by

Java High-Performance Architecture

Sharing Java development articles and resources, including SSM architecture and the Spring ecosystem (Spring Boot, Spring Cloud, MyBatis, Dubbo, Docker), Zookeeper, Redis, architecture design, microservices, message queues, Git, etc.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.