Databases 17 min read

Cutting Slow SQL: Real‑World Strategies that Reduced Thousands to Double‑Digits

This article details how the activity middle‑platform team identified the root causes of slow SQL queries—such as missing indexes, large data volumes, and inefficient joins—and applied data cleaning, partitioning, indexing, and query refactoring to slash daily slow‑query counts from thousands to just dozens.

dbaplus Community
dbaplus Community
dbaplus Community
Cutting Slow SQL: Real‑World Strategies that Reduced Thousands to Double‑Digits

1. What Is Slow SQL?

Slow SQL refers to queries whose execution time is unusually long, often accompanied by extensive row scans, temporary file sorting, or frequent disk flushes, which increase disk I/O and can cause timeouts. Contrary to common belief, slow‑query logs record not only SELECT statements but also any DML (INSERT, UPDATE, DELETE) that exceeds the long_query_time threshold.

From a business perspective, slow SQL degrades user experience and reduces product satisfaction. From the database side, each slow query consumes a share of limited I/O resources, potentially starving other queries and leading to queueing or even system crashes.

2. How Slow SQL Is Generated

Lack of Indexes : Queries on non‑indexed columns force full‑table scans.

Poor Query Conditions : Overly complex WHERE clauses, unnecessary JOINs, or sub‑queries increase execution cost.

Huge Data Volumes : Even with indexes, tables with tens of millions of rows can become sluggish.

Lock Waits : Accessing rows locked by other transactions blocks execution.

Insufficient Hardware : CPU, memory, or disk bottlenecks slow down query processing.

Bad Schema Design : Over‑normalization or redundant data hampers performance.

Stale Statistics : Inaccurate optimizer statistics lead to poor execution plans.

3. Slow SQL Governance Practices

3.1 Problem Analysis & Solutions

The monitoring platform showed thousands of slow SQLs per day, prompting a multi‑pronged investigation.

Data Volume

Tables had grown to tens of millions, even over a hundred million rows. The team adopted three mitigation steps:

Manual data cleanup: delete obsolete historical rows in batches to avoid overwhelming the live system.

Sharding (分库分表): introduce more partitions and a routing table to direct queries efficiently.

Offload massive queries to Elasticsearch or cache when MySQL cannot meet latency requirements.

Example of a full‑table scan: select * from a where id = 0; Optimized to fetch only needed columns:

select result from a where id = 0;

3.2 Index Usage

Indexes dramatically reduce I/O. Adding an index on the test column changed the execution plan from type=ALL (full scan) to type=ref (single‑row lookup), cutting execution time from ~4286 ms to 16 ms.

Typical index benefits and drawbacks are listed:

Improves query speed and reduces I/O.

Ensures uniqueness when needed.

Supports faster GROUP BY and ORDER BY.

Consumes storage space and adds maintenance overhead.

Common MySQL EXPLAIN types:

ALL : full table scan (worst).

index : full index scan.

range : range scan using part of an index.

index_merge : combines multiple indexes.

ref : uses a non‑unique index for a single row.

3.3 Join Queries

Joins on large tables often exceed the 1 s slow‑SQL threshold. Splitting a join into two separate queries reduced execution time from 1432 ms to under 800 ms each.

select * from a left join b on a.id = b.id where a.id = 0;

Split version:

select * from a where id = 0;</code><code>select * from b where id = 0;

3.4 Complex Conditions

Nested sub‑queries and IN clauses can cause extreme latency (e.g., 12 648 ms). The team replaced the complex query with a pre‑aggregated table to eliminate the heavy join.

select * from a where id in (select id from b) and time > '2024-03-29';

3.5 Overall Strategy

Even after data cleanup and query tuning, the scheduled deletion job still generated slow SQL. The revised strategy includes:

Avoiding joins; prefer simple, independent queries.

Driving deletions by activity rather than by time, using activity‑based sharding.

Leveraging the routing table to target only relevant partitions.

Configuration example (JSON‑like):

{
  "分表数量": 7,
  "表名": "table",
  "条件": "condition",
  "删除数量": 1,
  "删除策略": 1,
  "开始时间": "",
  "结束时间": ""
}

After applying these changes, daily slow‑SQL counts dropped from several thousand to a stable double‑digit figure, markedly improving system stability.

4. Lessons Learned

Prevent slow SQL at the source by enforcing coding standards and proper schema design.

Do not rely on offline databases to hide performance problems; they can still affect production.

Early, thoughtful database design reduces the need for costly refactoring later.

Continuously review and share slow‑SQL incidents to avoid repeated pitfalls.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

indexingmysqlDatabase OptimizationPartitioning
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.