How to Optimize Large MySQL Tables: Index Tuning, Online DDL, and Batch Deletion
This article walks through diagnosing slow‑query problems on a massive MySQL table, reveals index misuse, proposes dropping the old composite index, adding targeted indexes, using online DDL or pt‑osc for schema changes, and applying batch‑delete techniques to reduce latency and storage fragmentation.
Background
A production MySQL instance (one master, one slave) generates daily SLA alerts due to replication lag caused by heavy SELECT and DELETE operations on the arrival_record table, which holds over 100 million rows.
Slow‑query analysis
Using pt‑query‑digest on the past week’s mysql‑slow.log shows:
Total slow‑query time: 25 403 s; longest query 266 s. SELECT arrival_record runs ~40 k times, average 5 s, scanning up to 56 million rows. DELETE arrival_record runs 6 times, average 262 s.
Execution plans
The SELECT uses index IXFK_arrival_record but only the first column product_id, whose cardinality is low, causing massive row scans. The DELETE performs a full table scan (type=ALL) because no suitable index is available.
Index findings
The table has a single composite index
IXFK_arrival_record(product_id,station_no,sequence,receive_time,arrival_time). Because product_id has poor selectivity and station_no does not exist, the index is ineffective for the common queries.
Proposed optimizations
Drop the composite index IXFK_arrival_record.
Create a new composite index
idx_product_id_sequence_station_no(product_id,sequence,station_no).
Create a single‑column index idx_receive_time(receive_time) to support time‑range queries.
Backup and restore workflow
Use mydumper for parallel compressed backup (≈52 s, 1.2 GB) and myloader for parallel restore (≈126 min). Verify table size and fragmentation after restore.
Schema change methods
Two approaches were tested on a test instance:
Online DDL : Execute
ALTER TABLE … DROP FOREIGN KEY … DROP INDEX … ADD INDEX …with sql_log_bin=0. Total time ≈34 min.
pt‑osc : Perform the same changes using pt‑osc. Total time ≈57 min.
Online DDL proved roughly twice as fast as pt‑osc for this workload.
Batch‑delete strategy
Even after adding idx_receive_time, the DELETE that removes rows older than a given date still takes ~77 s for 300 k rows. To mitigate impact, replace the single massive DELETE with a loop that deletes a limited number of rows (e.g., 20 000) per iteration, sleeping briefly between batches. This reduces lock time and replication lag.
Results
After applying the new indexes: SELECT queries now use idx_receive_time, scanning far fewer rows (e.g., ~7.5 M → ~291 k rows).
Replication lag disappears because the application now deletes data in 10‑minute chunks, each taking ~1 s.
Conclusion
When a table grows to tens of gigabytes, it is essential to monitor not only query latency but also maintenance costs such as DDL duration and delete performance. Choosing the right indexing strategy, using online DDL, and performing batch deletions can dramatically improve both response times and overall system stability.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
