Thread Pool Isolation and Monitoring Design for Mobile Applications
The design separates the original I/O pool into dedicated network, I/O, and polling thread pools, adds comprehensive monitoring of task duration and frequency, enforces unified polling rules, and automatically tunes pool parameters, resulting in a 76 % reduction in UI lag and easier troubleshooting.
Introduction
As the application evolves, the number of asynchronous tasks in the retail engineering team has increased dramatically. Network requests, local DB operations, and polling tasks all share a single thread pool, which can easily become saturated (e.g., a slow network request blocks the pool). This leads to task backlog and noticeable UI lag. When users experience unresponsive UI, they may retry operations, further increasing the load and worsening the lag.
Tasks are not isolated; long‑running tasks affect short‑running ones.
Polling tasks consume excessive resources because there is no unified polling mechanism; each business unit creates its own threads or uses the default I/O pool.
Lack of monitoring makes it impossible to measure thread‑pool health, making troubleshooting extremely difficult.
Figure: Short‑term task surge within a few milliseconds.
Overall Design
Goals
Task isolation to prevent long‑running tasks from blocking interactive tasks.
Unified polling to reduce resource overhead.
Task monitoring to avoid misuse by business teams.
Information collection and monitoring for rapid issue localization.
The core of the optimization lies in separation and monitoring.
Separation
The original I/O thread pool is split, moving “slow” and “frequent” tasks out of it. This ensures the I/O pool can handle many fast local tasks efficiently.
Avoid short tasks waiting for long tasks.
Avoid high‑frequency tasks occupying most of the pool resources.
Specific strategies include:
Move network‑related tasks to a dedicated network thread pool.
Move polling tasks to a dedicated polling thread pool, preventing them from degrading the general pool.
Monitoring
In addition to separating known network and polling tasks, the design adds monitoring for all thread‑pool activity. By measuring each task’s execution time, the system can identify “slow” tasks (those that take unusually long) and “many” tasks (high‑frequency bursts) for further optimization.
Monitor task duration to separate long‑running tasks and decide whether they belong in the default I/O pool.
Detect bursts of short‑duration tasks that fill the pool and optimize them.
After isolating “slow” tasks, continue to monitor their resource consumption, especially for network requests.
Provide API monitoring to ensure business teams use polling responsibly.
Technical Implementation
1. Thread Management Library
Objectives:
Manage all thread pools and thread creation within the project.
Monitor sub‑thread tasks.
Isolate task execution across different pools.
Unify polling tasks and filter erroneous tasks.
Automatically tune thread‑pool parameters.
Three thread pools are provided:
Network thread pool – dedicated to network tasks; replacement is transparent to business code.
IO thread pool – handles local asynchronous tasks; RxJava is hooked at app startup to replace the default pool.
Polling thread pool – handles polling tasks; requires business‑side integration.
Thread‑pool API definition
2. Unified Polling
Disallow polling intervals shorter than 1 second.
Inspect all polling tasks every second.
Polling runs in an independent thread that sleeps when no tasks are registered.
Two callback modes: Single‑thread callback: At most one thread is used. If the previous task is not finished, pollTaskExceptionCallback() is invoked; otherwise pollTaskCallback() . Multi‑thread callback: Up to 30 threads can be used. The same logic applies, but the default is single‑thread.
Business can specify a fixed number of polling cycles or infinite polling.
Dynamic expansion of the polling pool: start with 30 threads, double when core threads are fully utilized, up to a maximum of 120 threads.
Polling task flowchart
Task Filtering
Normal polling invokes pollTaskCallback() . Exceptions are routed to pollTaskExceptionCallback() . The filtering mechanism records all tasks, validates thread usage on each execution, and resets records after completion.
Record every task.
Before execution, check whether the task has exhausted its allocated threads; if so, trigger the exception callback.
Reset records after task finishes.
Exception callbacks are triggered in three scenarios:
Single‑thread task does not release within timeout.
Multi‑thread task exceeds the maximum thread count.
Polling thread pool is fully occupied and times out.
Task filtering flowchart
Engineering Changes
Replace the gateway request thread pool with a custom network pool by hooking RxJava observables.
Replace the default RxJava thread pool with a custom IO pool at app startup.
Integrate business polling into the unified polling framework.
Intercept incorrect usage of the network thread pool to prevent IO threads from performing network requests.
Expose thread‑task monitoring to APM for reporting.
Note: Only polling requires explicit business integration; other changes are transparent.
Information Collection and Monitoring
When a task is submitted, record its call stack and submission time. When the task starts and finishes, compute total execution time and waiting time to identify “slow” tasks.
If a task’s waiting time exceeds a threshold while the pool is full, capture the task stack for debugging.
Thread‑pool listening API
Improvement Effect
Definition of “lag”: main thread >300 ms, ANR, sub‑thread >1 s, thread‑pool blockage, page render >200 ms.
Lag count trend (↓ 76 %)
Future Plans
Support more dimensions of task monitoring and add automatic alerting.
Provide device‑specific optimal thread‑pool configurations to maximize resource reuse.
Gradually enhance monitoring of pthreads and task counts.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.