How We Reduced Thousands of Slow SQL Queries to Double‑Digit Levels
This article details a real‑world case study of the activity middle‑platform system’s slow‑SQL problem, explaining the definition, causes, and harmful impacts of slow queries, then outlines step‑by‑step mitigation strategies—including data cleanup, partitioning, indexing, query refactoring, and scheduling—to dramatically reduce slow‑SQL occurrences from thousands to double‑digit counts.
What Is a Slow SQL?
Slow SQL refers to queries whose execution time is long, often accompanied by large row scans, temporary file sorting, or frequent disk flushes, leading to increased disk I/O and timeouts.
Slow‑query logs record not only SELECT statements but also INSERT/UPDATE/DML that exceed the long_query_time threshold.
Impact of Slow SQL
Business impact: Degraded user experience and reduced product satisfaction.
Database impact: Consumes disproportionate I/O resources; a single slow query can occupy 30% of resources for a minute, starving other queries.
How Slow SQL Is Generated
Lack of appropriate indexes, causing full‑table scans.
Improper query conditions, complex joins, sub‑queries.
Huge data volume even with indexes.
Lock waiting due to locked resources.
Insufficient hardware resources (CPU, memory, disk).
Poor database design (over‑normalization, redundancy).
Inaccurate statistics leading to bad execution plans.
Practical Governance Measures
1. Data Volume Management
Identify tables with tens of millions to over a hundred million rows; clean obsolete data.
Cleaning strategy: batch delete 2–3 tables per run, delete by primary key, and avoid massive single‑transaction deletions.
"分表数量":7,
"表名":"table",
"条件":"condition"2. Partitioning
Introduce more shards; store routing information in a configuration center to enable dynamic expansion without redeployment.
3. Offloading Large Queries to Elasticsearch
For high‑throughput scenarios, move massive data to ES or cache.
4. SQL Statement Optimization
Examples: select * from a where id = 0; Optimized to: select result from a where id = 0; Index usage dramatically reduces execution time (e.g., from 4286 ms to 16 ms).
Common index types and their trade‑offs are listed.
Explain output shows type=ALL for full‑table scans versus type=ref for indexed lookups.
5. Join Query Refactoring
Split a join into two separate queries and aggregate results in application code, reducing execution time from 1432 ms to ~730 ms each.
select * from a left join b on a.id = b.id where a.id = 0; select * from a where id = 0;
select * from b where id = 0;6. Complex Condition Simplification
Avoid nested sub‑queries and combine conditions; consider denormalizing data for frequent accounting queries.
7. Overall Strategy
Schedule nightly jobs to clean historical data based on activity dimensions rather than time alone, use partition routing to target specific shards, and avoid unnecessary scans.
Configuration example:
"分表数量":7,
"表名":"table",
"条件":"condition",
"删除数量":1,
"删除策略":1,
"开始时间":"",
"结束时间":""Results
After implementing the above measures, the system’s daily slow‑SQL count dropped from several thousand to double‑digit numbers, significantly improving stability.
Key Takeaways
Prevent slow SQL at the source by adhering to coding standards.
Do not ignore slow queries in offline databases; they can still affect production.
Design databases with future scalability in mind.
Continuously review and share lessons to avoid repeated pitfalls.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
