Cutting Slow SQL: Real‑World Strategies that Reduced Thousands to Double‑Digits
This article details how the activity middle‑platform team identified the root causes of slow SQL queries—such as missing indexes, large data volumes, and inefficient joins—and applied data cleaning, partitioning, indexing, and query refactoring to slash daily slow‑query counts from thousands to just dozens.
1. What Is Slow SQL?
Slow SQL refers to queries whose execution time is unusually long, often accompanied by extensive row scans, temporary file sorting, or frequent disk flushes, which increase disk I/O and can cause timeouts. Contrary to common belief, slow‑query logs record not only SELECT statements but also any DML (INSERT, UPDATE, DELETE) that exceeds the long_query_time threshold.
From a business perspective, slow SQL degrades user experience and reduces product satisfaction. From the database side, each slow query consumes a share of limited I/O resources, potentially starving other queries and leading to queueing or even system crashes.
2. How Slow SQL Is Generated
Lack of Indexes : Queries on non‑indexed columns force full‑table scans.
Poor Query Conditions : Overly complex WHERE clauses, unnecessary JOINs, or sub‑queries increase execution cost.
Huge Data Volumes : Even with indexes, tables with tens of millions of rows can become sluggish.
Lock Waits : Accessing rows locked by other transactions blocks execution.
Insufficient Hardware : CPU, memory, or disk bottlenecks slow down query processing.
Bad Schema Design : Over‑normalization or redundant data hampers performance.
Stale Statistics : Inaccurate optimizer statistics lead to poor execution plans.
3. Slow SQL Governance Practices
3.1 Problem Analysis & Solutions
The monitoring platform showed thousands of slow SQLs per day, prompting a multi‑pronged investigation.
Data Volume
Tables had grown to tens of millions, even over a hundred million rows. The team adopted three mitigation steps:
Manual data cleanup: delete obsolete historical rows in batches to avoid overwhelming the live system.
Sharding (分库分表): introduce more partitions and a routing table to direct queries efficiently.
Offload massive queries to Elasticsearch or cache when MySQL cannot meet latency requirements.
Example of a full‑table scan: select * from a where id = 0; Optimized to fetch only needed columns:
select result from a where id = 0;3.2 Index Usage
Indexes dramatically reduce I/O. Adding an index on the test column changed the execution plan from type=ALL (full scan) to type=ref (single‑row lookup), cutting execution time from ~4286 ms to 16 ms.
Typical index benefits and drawbacks are listed:
Improves query speed and reduces I/O.
Ensures uniqueness when needed.
Supports faster GROUP BY and ORDER BY.
Consumes storage space and adds maintenance overhead.
Common MySQL EXPLAIN types:
ALL : full table scan (worst).
index : full index scan.
range : range scan using part of an index.
index_merge : combines multiple indexes.
ref : uses a non‑unique index for a single row.
3.3 Join Queries
Joins on large tables often exceed the 1 s slow‑SQL threshold. Splitting a join into two separate queries reduced execution time from 1432 ms to under 800 ms each.
select * from a left join b on a.id = b.id where a.id = 0;Split version:
select * from a where id = 0;</code><code>select * from b where id = 0;3.4 Complex Conditions
Nested sub‑queries and IN clauses can cause extreme latency (e.g., 12 648 ms). The team replaced the complex query with a pre‑aggregated table to eliminate the heavy join.
select * from a where id in (select id from b) and time > '2024-03-29';3.5 Overall Strategy
Even after data cleanup and query tuning, the scheduled deletion job still generated slow SQL. The revised strategy includes:
Avoiding joins; prefer simple, independent queries.
Driving deletions by activity rather than by time, using activity‑based sharding.
Leveraging the routing table to target only relevant partitions.
Configuration example (JSON‑like):
{
"分表数量": 7,
"表名": "table",
"条件": "condition",
"删除数量": 1,
"删除策略": 1,
"开始时间": "",
"结束时间": ""
}After applying these changes, daily slow‑SQL counts dropped from several thousand to a stable double‑digit figure, markedly improving system stability.
4. Lessons Learned
Prevent slow SQL at the source by enforcing coding standards and proper schema design.
Do not rely on offline databases to hide performance problems; they can still affect production.
Early, thoughtful database design reduces the need for costly refactoring later.
Continuously review and share slow‑SQL incidents to avoid repeated pitfalls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
