Java Thread Pool Optimization and Dynamic Monitoring Practice at ZhiZhuan Platform
This article shares the author's experience with Java thread pools, covering initial concepts, parameter tuning, dynamic adjustment architecture, implementation details, and monitoring solutions applied in the ZhiZhuan platform to improve performance and reliability under high concurrency.
As an internet programmer frequently facing high‑concurrency scenarios, the author discusses the importance of thread pools in Java concurrent programming and shares practical experiences from the ZhiZhuan platform.
1. Initial Understanding of Thread Pools
Thread pools allow reuse of threads, reducing creation overhead and limiting the number of concurrent threads to avoid resource waste. The author first used a thread pool in 2019 to aggregate data from multiple services for a user request, improving response time by executing tasks in parallel. Java provides ThreadPoolExecutor in java.util.concurrent, and the pool operates in four stages (illustrated by an image).
Key configuration parameters include:
corePoolSize – number of core threads based on load and hardware.
maximumPoolSize – maximum threads the system can support.
keepAliveTime – idle thread lifetime before termination.
workQueue – task queue type and size (e.g., ArrayBlockingQueue, LinkedBlockingQueue).
rejectedExecutionHandler – policy for handling tasks when the queue is full.
The author ultimately chose a balanced configuration: thread count = CPUs*2 for I/O‑bound workloads, queue length estimated as 1000 based on average task duration and QPS, which proved sufficient for production.
2. Optimization and Practice
By 2021, increased traffic exposed several thread‑pool issues during a full‑link stress test before the 618 promotion:
Insufficient pool size causing request delays.
Excessive pool size leading to resource waste.
Queue saturation resulting in request rejections.
Long task execution times affecting other tasks.
Interference among multiple pools within the same microservice.
These problems required different pool settings for different business scenarios (e.g., larger core size for user requests, larger queue for data export). Adjusting parameters required code releases and extensive testing, which was time‑consuming.
To address this, a dynamically adjustable and monitorable thread pool was designed, consisting of three parts: client, monitoring platform, and configuration backend.
2.1 Overall Architecture
The client extends ThreadPoolExecutor, preserving all native capabilities while adding creation, registration, warm‑up, and parameter‑update functions. The configuration backend manages core parameters ( corePoolSize, maximumPoolSize, workQueueCapacity) and pushes updates without redeploying services (alternatives include Apollo, Nacos). The monitoring platform tracks pool activity, queue saturation, and task blocking time, providing real‑time alerts.
2.2 Dynamic Parameter Implementation
Dynamic adjustments rely on ThreadPoolExecutor setter methods:
public void setCorePoolSize(int corePoolSize);<br/>public void setMaximumPoolSize(int maximumPoolSize);<br/>public void setKeepAliveTime(long time, TimeUnit unit);<br/>public void setThreadFactory(ThreadFactory threadFactory);<br/>public void setRejectedExecutionHandler(RejectedExecutionHandler handler);Using setCorePoolSize and setMaximumPoolSize, the pool can increase or decrease threads smoothly. Since ThreadPoolExecutor does not support dynamic queue resizing, a custom LinkedBlockingQueue with adjustable capacity was implemented (illustrated by an image).
2.3 Thread‑Pool Monitoring Implementation
Monitoring uses ThreadPoolExecutor getter methods:
public int getActiveCount();<br/>public BlockingQueue<Runnable> getQueue();<br/>public int getCorePoolSize();<br/>public int getMaximumPoolSize();<br/>public long getTaskCount();Two approaches collect metrics:
Override beforeExecute() and afterExecute() to report data.
Subclass ThreadPoolExecutor and add monitoring code.
Key monitoring indicators include pool activity (activeCount/maximumPoolSize), queue saturation (queueSize/queueCapacity), and task blocking time (executeStartTime‑inQueueTime). The monitoring dashboard and alerting screenshots are shown below.
3. Summary
Since its adoption, the dynamic thread pool has enabled timely detection of potential issues, automatic disaster recovery during traffic spikes, and performance tuning through stress tests, ensuring the ZhiZhuan platform’s services remain stable during major events like 618 and Double‑11 without any thread‑pool‑related incidents. The author hopes this sharing helps others facing similar challenges.
About the author
Wu Ao, Backend Development, Platform Technology Department, ZhiZhuan.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
