How a Dubbo 2.7.12 Bug Caused Memory Leaks and Service Outages – Diagnosis and Fix
After a late‑night incident where a Dubbo 2.7.12 service crashed, the author traced high memory and CPU usage to a full GC spike, identified a HashedWheelTimer thread‑pool bug causing request timeouts to be missed, reproduced the leak, and confirmed the issue was fixed in Dubbo 2.7.13.
Background
One night the author received a call that a Dubbo service had failed. The system consists of three services (A, B, C) that communicate via Dubbo RPC. When the incident occurred, several instances of service B were dead, causing the remaining instances to experience a surge in request volume and latency.
Investigation
Monitoring showed that the problematic machines had memory usage around 80% and increased CPU consumption. Full GC time also rose sharply, indicating a memory leak.
The JVM full‑GC monitor confirmed the spike.
A WARN log from HashedWheelTimer showed a RejectedExecutionException:
[dubbo-future-timeout-thread-1] WARN org.apache.dubbo.common.timer.HashedWheelTimer$HashedWheelTimeout (HashedWheelTimer.java:651)
- [DUBBO] An exception was thrown by TimerTask., dubbo version: 2.7.12, current host: xxx.xxx.xxx.xxx
java.util.concurrent.RejectedExecutionException:
Task org.apache.dubbo.remoting.exchange.support.DefaultFuture$TimeoutCheckTask$Lambda$674/1067077932@13762d5a
rejected from java.util.concurrent.ThreadPoolExecutor@7a9f0e84[Terminated, pool size = 0, active threads = 0, queued tasks = 0, completed tasks = 21]The service was using Dubbo version 2.7.12. The author searched related GitHub issues (e.g., #6820, #8172, #8188) but the exact symptom was not documented.
To reproduce, three services were set up and the provider was forced to block indefinitely:
Thread.sleep(Integer.MAX_VALUE);When a provider was killed, the shared executor was closed, causing subsequent timeout‑check tasks to be rejected. The HashedWheelTimer relies on this thread pool to detect request timeouts. If the pool is closed, timeout detection stops, leading to unchecked requests that eventually exhaust memory.
The following code shows how tasks are submitted to the timer:
public void expire() {
if (!compareAndSetState(ST_INIT, ST_EXPIRED)) {
return;
}
try {
task.run(this);
} catch (Throwable t) {
if (logger.isWarnEnabled()) {
logger.warn("An exception was thrown by " + TimerTask.class.getSimpleName() + '.', t);
}
}
}By continuously sending requests to a blocked provider, the author reproduced a memory‑blowout scenario; when the timeout detection worked, memory remained low.
Conclusion
The issue occurs only under asynchronous calls where the provider is abnormally taken offline and remains blocked, causing requests to never return. The bug was introduced in Dubbo 2.7.10 and fixed in 2.7.13.
Post‑mortem
When performing damage control, retain the incident scene (e.g., dump memory or capture traffic) before restarting services.
Observability is crucial: logs, request metrics, machine metrics (CPU, memory, network), and JVM metrics (thread pools, GC) should be comprehensive.
Open‑source projects often have searchable logs; leveraging community knowledge can dramatically reduce troubleshooting time.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
