How to Tackle MySQL Slow Queries: Metrics, Strategies, and Real Cases
This article explains what constitutes a MySQL slow query, why they cause failures, defines quantitative metrics such as micro‑average and macro‑average to assess severity, outlines target goals, presents concrete optimization examples, and shares operational practices for ongoing slow‑SQL governance.
Why Address Slow SQL?
What Is a Slow SQL?
In MySQL, a slow query is any statement whose execution time exceeds the long_query_time threshold.
MySQL records such statements in the slow‑query log (slowlog), which captures not only SELECTs but also INSERT, UPDATE, and other DML statements that surpass the threshold.
How to Check Slow‑Query Settings
# Check if slow query logging is enabled
show variables like "slow_query_log%";
# Check the threshold (seconds)
show variables like "long_query_time";In AliSQL‑X‑Cluster (XDB), slow‑query logging is enabled by default with long_query_time set to 1 second.
Why Slow Queries Cause Failures
Real slow SQL often triggers massive row scans, temporary file sorting, or frequent disk flushes, raising disk I/O and turning normal queries into slow ones, leading to widespread timeouts.
After the 2022 Double‑11 incident, the CTO‑D team launched a dedicated slow‑SQL remediation project, with my team as the sponsor.
What Level Should Be Reached?
Measuring Severity
Micro‑average
sum(app_slow_sql_executions) / sum(app_total_sql_executions)A higher value indicates greater impact; a value of 1 means every execution is slow, 0 means none are.
However, this metric can be diluted for high‑QPS apps where occasional slow queries are hidden.
Macro‑average
(sum(slow_sql1_executions) + … + sum(slow_sql_n_executions)) / (sum(sql1_executions) + … + sum(sql_n_executions))This reduces the influence of rare, false‑positive slow queries but may be biased for low‑QPS applications.
Execution Count sum(app_slow_sql_executions) / 7 Using the average daily slow‑SQL executions over the past week mitigates macro‑average distortion.
Slow‑SQL Template Count
count(distinct(app_slow_sql_template))Targets
Core applications: eliminate all slow SQL.
Regular applications: reduce the micro‑average metric by 50%.
CTO Report
Metrics are aggregated per CTO‑D line, weighted, and ranked from low to high. The top‑3 and bottom‑3 are highlighted in a weekly report.
Actionable Guidelines
MySQL Standards
Prohibit joins on more than three tables. Ensure join fields have identical data types and are indexed.
When indexing VARCHAR columns, specify an appropriate prefix length based on field selectivity (e.g., a 20‑character prefix often yields >90% selectivity).
Avoid left‑most or full wildcard searches; use a search engine for such cases.
Prevent implicit type conversion that can invalidate indexes.
Do not over‑index nor under‑index; balance index coverage with update overhead.
DB Change Standards
DDL changes must be paced, respect gray‑release windows, and control concurrency.
Real‑World Optimization Cases
Uneven Data Distribution
Eight databases with sixteen tables each were found to concentrate data in only two tables per database, indicating a flawed sharding strategy.
Index Issues
A composite index idx_logistics_corp_id_special_id had low selectivity. Switching to a more selective combination (logistics_corp_id, transport_type_id) improved performance.
Problematic Query Example
SELECT COUNT(0) AS `tmp_count`
FROM (
SELECT `table_holder`.`user_id`,
`table_holder`.`sc_item_id`,
SUM(CASE `table_holder`.`inventory_type`
WHEN 1 THEN `table_holder`.`quantity`
ELSE 0 END) AS `saleable_quantity`,
SUM(CASE `table_holder`.`inventory_type`
WHEN 1 THEN `table_holder`.`lock_quantity`
ELSE 0 END) AS `saleable_lock_quantity`,
SUM(CASE `table_holder`.`inventory_type`
WHEN 401 THEN `table_holder`.`quantity`
ELSE 0 END) AS `transfer_on_way_quantity`,
`table_holder`.`store_code`,
MAX(`table_holder`.`gmt_modified`) AS `gmt_modified`
FROM `table_holder`
WHERE `table_holder`.`is_deleted` = 0
AND `table_holder`.`quantity` > 0
AND `table_holder`.`user_id` IN (3405569954)
AND `table_holder`.`store_code` IN ('ZJJHBHYTJJ0001', '...1000+ others')
GROUP BY `table_holder`.`user_id`, `table_holder`.`sc_item_id`
ORDER BY `table_holder`.`user_id` ASC, `table_holder`.`sc_item_id` ASC
) `a`;The table had a store_code index, yet the optimizer chose a full table scan because the IN list exceeded 200 values, causing the index to be ignored.
Adjusting the query to use a composite index on (is_deleted, quantity) resolved the issue.
External Influence
High‑selectivity index fields were still causing scans when many instances shared the same physical host, leading to disk I/O contention and temporary file generation.
Multiple MySQL leader nodes co‑located on the same machine shared disk buffers, causing occasional slow‑SQL spikes that could not be eliminated solely by indexing.
Unsolvable Cases
Some scenarios, such as overly complex multi‑field queries without mandatory fields, cannot be efficiently handled by indexes alone and may require a dedicated search engine.
Daily Operations
Across CTO‑D lines, metrics have dramatically improved: core apps have cleared all slow queries, and our team moved from the bottom‑3 to the top‑3 in weekly rankings.
We now run a weekly slow‑SQL ticket pipeline: owners receive assignments, resolve issues, and close tickets, establishing a routine of zero‑slow‑SQL weeks.
Conclusion
This retrospective outlines the strategy formulation, problem analysis, and resolution process for MySQL slow‑SQL governance, offering valuable lessons for similar large‑scale database operations.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
