How We Cut 98% of Slow MySQL Queries by Tuning Thread‑Pool Settings
This article walks through a real‑world MySQL performance case at Meituan‑Dianping, showing how systematic monitoring, packet capture, and step‑by‑step analysis identified the Thread‑Pool as the bottleneck and how disabling and fine‑tuning its parameters reduced slow queries from thousands to just a few dozen.
Problem Description
Meituan‑Dianping, a large internet‑plus lifestyle platform, experienced intermittent timeouts in online services because certain SQL statements showed high latency (99th‑percentile response times often exceeding 100 ms, sometimes up to 1 s). The slow queries were simple primary‑key lookups on very small tables, yet they were not recorded in the MySQL slow‑log.
Overview Analysis
By examining CAT monitoring data, the author broke down the total response time into three components:
App Server – time to send the SQL and receive the result.
Network – transmission time of the request and result.
MySQL Server – time spent inside MySQL from receipt to result.
TcpDump was deployed at both the application and MySQL servers to capture timestamps at four monitoring points (APP → MySQL request, MySQL receipt, MySQL response, APP receipt). The measurements showed that the majority of the delay (≈606 ms) occurred inside MySQL.
Solution Iterations
Idea 1: Identify the Slowest Stage
Method: Use TcpDump to record timestamps at the four points listed above. Result: Most of the latency originated in MySQL (≈606 ms).
Idea 2: Decompose MySQL Processing Stages
Method: Refer to "High Performance MySQL" to map MySQL into three layers – connection/thread handling, the MySQL server layer (parser, optimizer, cache, etc.), and the storage‑engine layer. This clarified where the execution time could be spent.
Idea 3: Suspect Thread Management
Since the slow queries were not logged by the slow‑log, the problem must lie before the storage engine – likely in MySQL's thread handling.
Idea 4: Disable Thread‑Pool (MySQL 5.6)
Method: Change thread_handling from pool-of-threads to One-Connection-Per-Thread. Result: After disabling the Thread‑Pool, slow‑SQL count dropped from 3788 to 818 (≈78 % reduction).
Idea 5: Tune Thread‑Pool Parameters
Method: Investigate Thread‑Pool configuration parameters (e.g., thread_pool_stall_limit) and set thread_pool_stall_limit=10. Result: Slow‑SQL count fell from 3788 to 63, a 98.3 % reduction.
Problem Analysis
Before Thread‑Pool, MySQL used a one‑thread‑per‑connection model, which degrades performance under high connection counts. The Thread‑Pool introduced in MySQL 5.6 creates a pool of worker threads grouped into thread groups, each handling connections to keep a limited number of active threads and reduce context‑switch overhead.
Each thread group can be in one of four states: LISTENER, IDLE, ACTIVE, or WAITING. When a thread is waiting for I/O, it is marked WAITING; if a new request arrives while a thread is waiting, the pool may create or wake another thread depending on thread_pool_stall_limit (default 500 ms). Idle threads self‑terminate after thread_pool_idle_timeout (default 60 s).
How Thread‑Pool Works
A dedicated listener thread accepts new connections. In the one‑thread‑per‑connection model, the listener pulls a thread from the cache or creates a new one for each connection. In the Thread‑Pool model, the listener enqueues the connection into a thread‑group queue; a worker thread from that group then handles the connection, keeping the number of active threads per CPU core low.
Solution
Set thread_pool_stall_limit=10 so that threads blocked longer than 10 ms are marked stalled, prompting the pool to spawn additional workers and preventing other queries from waiting excessively.
Value Delivered
For the xxx‑service example, slow‑SQL count decreased from 3788 to 63 (98.3 % reduction).
The fix benefited hundreds of product lines, raising overall service availability above 99.99 %.
Summary
The case study demonstrates a systematic approach to diagnosing end‑to‑end database performance issues: collect monitoring data, isolate the latency‑heavy stage, use packet capture to pinpoint MySQL internals, and apply targeted configuration changes. By focusing on the Thread‑Pool introduced in MySQL 5.6, the team first disabled it to confirm the hypothesis, then fine‑tuned thread_pool_stall_limit, achieving dramatic reductions in slow queries and improving platform reliability.
References
https://my.oschina.net/andylucc/blog/820624
https://yq.aliyun.com/articles/41078
http://blog.chinaunix.net/uid-28364803-id-3427833.html
http://blog.chinaunix.net/uid-28364803-id-3431242.html
https://www.percona.com/doc/percona-server/5.6/performance/threadpool.html
https://mariadb.com/kb/en/mariadb/thread-pool-in-mariadb/
https://www.safaribooksonline.com/library/view/high-performance-mysql/9781449332471/ch01.html#mysqlas_logical_architecture
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
