Databases 16 min read

MySQL Slow Query Analysis and Index Optimization for Large Tables

This article presents a detailed investigation of a MySQL master‑slave instance suffering from nightly SLA alerts due to slow‑query latency, analyzes the root causes with pt‑query‑digest and execution plans, proposes index redesign, demonstrates backup and restore with mydumper, compares online DDL and pt‑osc, and finally shows batch‑delete techniques to reduce load on large tables.

Top Architect
Top Architect
Top Architect
MySQL Slow Query Analysis and Index Optimization for Large Tables

Background – A master‑slave MySQL instance (one primary, one replica) generated nightly SLA alerts because of significant replication lag caused by slow SELECT and DELETE statements on the arrival_record table.

Analysis – Using pt-query-digest --since=148h mysql-slow.log | less the author identified that the slow queries mainly involved a SELECT counting rows with filters on product_id and receive_time , and a DELETE removing old records. The slow‑query report showed total slow‑query time of 25,403 s, the longest query taking 266 s, and an average of 5 s per query, scanning tens of millions of rows.

Execution Plan Inspection – EXPLAIN SELECT count(*) FROM arrival_record WHERE ... revealed that the existing composite index IXFK_arrival_record(product_id,station_no,sequence,receive_time,arrival_time) was not fully utilized because the WHERE clause omitted station_no and sequence . The optimizer only used the leftmost column product_id , whose cardinality is low, leading to large row scans.

Recommendations – Drop the ineffective composite index and create a new composite index idx_product_id_sequence_station_no(product_id,sequence,station_no) plus a single‑column index idx_receive_time(receive_time) . This allows the SELECT queries to use the more selective receive_time index and reduces scanned rows dramatically.

Packet Capture – A tcpdump -i bond0 -s 0 -l -w - dst port 3316 | strings | grep select | egrep -i 'arrival_record' > /tmp/select_arri.log command captured the actual WHERE clauses sent by the application, confirming the missing fields in the index.

Testing – The table (≈112 M rows, ~48 GB on disk) was dumped with mydumper (parallel compression, 1.2 GB dump, 52 s) and restored using myloader (≈126 min). Both online DDL and pt‑osc were exercised; online DDL completed in ~34 min versus ~57 min for pt‑osc .

Implementation – A Bash script performed the DDL changes without binary logging, stopped replication, applied the new indexes, and restarted replication. Sample snippet: function red_echo() { local what="$*"; echo -e "$(date +%F-%T) ${what}"; } function check_last_comm() { if [ "$1" != "0" ]; then red_echo "$2"; echo "exit 1"; exit 1; fi; } red_echo "stop slave" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "stop slave" check_last_comm "$?" "stop slave failed" red_echo "online ddl begin" mysql -uroot -p$passwd --socket=/datas/mysql/data/${port}/mysqld.sock -e "set sql_log_bin=0; ALTER TABLE $db_.\\`$table_name\\\` DROP FOREIGN KEY FK_arrival_record_product, DROP INDEX IXFK_arrival_record, ADD INDEX idx_product_id_sequence_station_no(product_id,sequence,station_no), ADD INDEX idx_receive_time(receive_time);" >> $log_file 2>&1 red_echo "online ddl stop"

Post‑Optimization Results – After the new indexes were in place, EXPLAIN showed the queries using the idx_receive_time index with row estimates dropping from millions to a few hundred thousand. The DELETE still took ~77 s for 300 k rows, so the author switched to batch deletion: first obtain the maximum id to delete, then repeatedly execute DELETE FROM arrival_record WHERE id < @need_delete_max_id LIMIT 20000; and pause briefly, dramatically reducing impact on the primary.

Conclusion – For very large tables, monitor not only query latency but also maintenance costs such as DDL time and delete overhead. Choose index strategies based on actual query patterns, use online DDL or pt‑osc appropriately, and prefer small‑batch deletes to avoid long‑running full‑table scans and replication lag.

MySQLIndex OptimizationLarge Tablesdatabase performanceslow querypt-query-digest
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.