How to Measure and Eliminate Slow SQL in Large‑Scale MySQL Deployments
This article explains what MySQL slow queries are, why they cause system failures, proposes multi‑dimensional metrics to assess their severity, outlines concrete guidelines and change standards, and shares real‑world optimization cases and daily operational practices for eliminating slow SQL.
Why address slow SQL?
In MySQL, a "slow query" is any statement whose execution time exceeds the long_query_time threshold (default 1 s in AliSQL‑X‑Cluster). The slow‑query log records not only SELECTs but also INSERT, UPDATE, and other DML statements that cross this limit.
Typical checks:
show variables like "slow_query_log%";
show variables like "long_query_time";When slow query logging is enabled, MySQL records every statement that runs longer than the configured threshold.
How slow queries lead to failures
Slow SQL often triggers massive row scans, temporary file sorting, or frequent disk flushes, which increase disk I/O, cause other queries to become slow, and can result in widespread time‑outs.
Metrics for evaluating application‑level slow‑SQL impact
1. Micro‑average
Formula:
sum(app slow‑SQL executions) / sum(app total SQL executions). Values range from 0 (no slow SQL) to 1 (all executions are slow). This metric lacks discrimination when overall QPS is high and slow queries are rare.
2. Macro‑average
Formula: ( Σ slow‑SQL_i executions / Σ SQL_i executions ) / n, where *i* iterates over distinct SQL statements. It reduces the influence of occasional slow queries by weighting each statement equally.
3. Execution count
Use the average daily count of slow‑SQL executions over the past week to avoid distortion from low‑traffic applications.
4. Slow‑SQL template count
Track the number of distinct slow‑SQL templates globally to monitor historical trends.
Goals
Core applications: eliminate all slow SQL.
Regular applications: reduce the micro‑average metric by 50%.
CTO‑level reporting
Aggregate the multi‑dimensional metrics per CTO‑D group, compute a weighted average, rank applications from low to high, highlight the top‑3 and bottom‑3, and publish weekly.
Why I lead this effort
My background in C/C++ and large‑scale multi‑active MySQL architectures gives me insight into MySQL performance, and my team’s low‑traffic services provide a clean slate for inserting this governance across business lines.
Actionable support
1. MySQL index guidelines (excerpt)
Prohibit joins involving more than three tables; ensure join columns have identical data types and are indexed.
When indexing VARCHAR columns, specify an appropriate prefix length based on the column’s discriminative power (e.g., 20 characters for >90% selectivity).
Enforce explicit index lengths for CHAR, VARCHAR, INT, TIMESTAMP, DATETIME, considering NULLability.
Avoid implicit type conversion that disables index usage.
Do not create indexes indiscriminately; assess selectivity and maintenance cost.
2. DB change standards
DDL changes must be paced, use gray‑release strategies, respect defined change windows, and control concurrency.
Real‑world optimization examples
1. Data distribution imbalance
Eight databases with sixteen tables each were found to concentrate most rows in only two tables per database, indicating a flawed sharding strategy.
2. Index misuse
A composite index idx_logistics_corp_id_special_id existed, yet its selectivity was low. Re‑evaluating the field combination (logistics_corp_id, transport_type_id) yielded a far higher discriminative power.
3. Problematic query
SELECT COUNT(0) AS `tmp_count`
FROM (
SELECT `table_holder`.`user_id`, `table_holder`.`sc_item_id`,
SUM(CASE WHEN `table_holder`.`inventory_type` = 1 THEN `table_holder`.`quantity` ELSE 0 END) AS `saleable_quantity`,
SUM(CASE WHEN `table_holder`.`inventory_type` = 1 THEN `table_holder`.`lock_quantity` ELSE 0 END) AS `saleable_lock_quantity`,
SUM(CASE WHEN `table_holder`.`inventory_type` = 401 THEN `table_holder`.`quantity` ELSE 0 END) AS `transfer_on_way_quantity`,
`table_holder`.`store_code`,
MAX(`table_holder`.`gmt_modified`) AS `gmt_modified`
FROM `table_holder`
WHERE `table_holder`.`is_deleted` = 0
AND `table_holder`.`quantity` > 0
AND `table_holder`.`user_id` IN (3405569954)
AND `table_holder`.`store_code` IN ('ZJJHBHYTJJ0001', '...1000多个')
GROUP BY `table_holder`.`user_id`, `table_holder`.`sc_item_id`
ORDER BY `table_holder`.`user_id` ASC, `table_holder`.`sc_item_id` ASC
) `a`;The query used an existing store_code index, yet the optimizer chose a full table scan because the IN‑list exceeded ~200 values, causing the index to be ignored.
Replacing the index with a composite key on (is_deleted, quantity) resolved the issue.
4. Cases that could not be solved
Even with high‑selectivity indexes, some workloads required multi‑field combination queries without mandatory fields, making index‑only solutions impractical. In such scenarios, moving the search to a dedicated search engine or redesigning the data model is recommended.
Daily operationalization
Each week, slow‑SQL work orders are generated, owners address them, and the tickets are closed, establishing a routine of “zero‑slow‑SQL” for core services. The team’s ranking improved from bottom‑3 to top‑3 after systematic governance.
Conclusion
The systematic definition of metrics, clear index standards, and weekly operational cadence transformed slow‑SQL from an occasional pain point into a routine, measurable, and largely eliminated issue across the organization.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
