Databases 15 min read

Optimizing Large MySQL Tables: Index Tuning, Slow Query Analysis, and Efficient Deletion Strategies

This article details a comprehensive approach to optimizing a large MySQL table, including slow‑query analysis with pt‑query‑digest, index redesign, backup and restore using mydumper, online DDL versus pt‑osc, and batch deletion techniques to reduce latency and storage fragmentation.

Top Architect
Top Architect
Top Architect
Optimizing Large MySQL Tables: Index Tuning, Slow Query Analysis, and Efficient Deletion Strategies

The author, a senior architect, investigates performance issues on a large MySQL table ( arrival_record ) that stores over 100 million rows, where slow queries and SLA alerts indicate significant master‑slave replication lag.

Using pt-query-digest on the recent mysql-slow.log , the analysis reveals that the most expensive queries are SELECT COUNT(*) and SELECT ... WHERE product_id=26 AND receive_time BETWEEN ... , scanning tens of millions of rows due to a low‑selectivity composite index IXFK_arrival_record . The SELECT statements scan up to 32 million rows, while the DELETE statement performs a full table scan.

Key findings include:

The table has only one composite index, and the leading column product_id has poor cardinality.

The receive_time column has high cardinality and would benefit from a dedicated index.

Many queries also filter on station_no and sequence , which are not usable with the current index order.

Optimization steps proposed and executed:

Drop the existing composite index IXFK_arrival_record .

Create a new composite index idx_product_id_sequence_station_no covering product_id, sequence, station_no .

Add a single‑column index idx_receive_time on receive_time .

Backup and restore were performed with mydumper and myloader , achieving a 1.2 GB compressed dump of a 48 GB table in 52 seconds. After restoring to a test instance, the table size matched the original without fragmentation.

DDL changes were applied using two methods:

Online DDL (native MySQL) – completed in ~34 minutes.

pt‑osc tool – completed in ~57 minutes.

Post‑optimization EXPLAIN output shows both SELECT queries now use the idx_receive_time index, reducing scanned rows from millions to a few hundred thousand.

Despite index improvements, the DELETE operation still takes ~77 seconds because it deletes millions of rows based on receive_time . Two further strategies are suggested:

Batch deletion of small time windows (e.g., 10‑minute intervals) to keep each delete under 1 second.

Iterative deletion by primary key range (e.g., 20 000 rows per batch) with a pause between batches.

After applying batch deletions, the application no longer experiences SLA alerts, and the table’s maintenance cost is reduced.

Conclusion: For very large tables, monitor not only query latency but also maintenance overhead. Choose appropriate DDL tools based on table characteristics (foreign keys, size), and prefer incremental batch deletions to avoid long‑running full‑table scans.

MySQLIndex OptimizationBackupdatabase performanceslow queryDDL
Top Architect
Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.