Why Did My Dubbo Gateway Leak Memory? Uncovering the Hidden Zookeeper Subscription Bug
This article details a mysterious memory‑leak in a Dubbo microservice gateway caused by repeated Zookeeper subscriptions, explains how the issue was diagnosed using heap dumps and source inspection, and shows the simple fix of disabling reference checks to prevent the leak.
Background
In a microservice architecture each service has its own network address while clients call them through a unified address, requiring a gateway to bridge client and service communication. The gateway provides authentication, lifecycle management, load‑balancing, circuit‑breaking, monitoring, and risk control.
The subject of this article is an internally developed Dubbo gateway that converts HTTP to Dubbo protocol, whose core feature is Dubbo generic invocation. Generic invocation is used when the client lacks the API interface and model classes; parameters and return values are represented as maps.
Typical Dubbo calls import the provider’s JAR containing the interface, but a gateway cannot import all provider JARs, nor re‑publish when interfaces change, so generic invocation is required. The official example code is shown in the image below.
Problem Description
The gateway ran stably for a while, then began experiencing frequent full GC, rising CPU usage, and increasing error rates despite unchanged request volume. After a restart and a week of heap‑dump analysis, the issue recurred.
Investigation
Start from the heap dump using Eclipse MAT.
Found over 7,000 RegistryDirectory objects, indicating a possible leak. Traced the owners to CuratorZookeeperClient$CuratorWatcherImpl instances.
Only two places create such watchers: during ReferenceConfig proxy creation (via RegistryProtocol.doRefer) and in FailbackRegistry retry logic. The latter was suspected first.
Testing showed that Zookeeper reconnection does not create new watcher objects because they are cached per URL.
Search online for similar issues; found GitHub issues #376 and #4587, but they did not match the observed behavior.
Further analysis revealed that the gateway caches ReferenceConfig objects using a key composed of interface, version, group, and timeout. When a service has no provider, the gateway still attempts to create a proxy, leading to repeated subscriptions.
By instrumenting ZookeeperRegistry’s zkListeners field via reflection and logging it periodically, the repeated subscriptions were captured.
The offending service had no provider, causing the gateway to generate a new URL each time (different timestamp), which was cached in a list and repeatedly referenced, leading to a quadratic growth of RegistryDirectory objects (1+2+3+…+n).
The fix is straightforward: set the check flag to false when using generic invocation, preventing the creation of proxy objects for non‑existent providers.
Summary
This was a Dubbo bug; version 2.7.5 removed the timestamp from subscription URLs, ensuring each URL is subscribed only once (see issue #4587).
When using generic invocation, set reference.check=false to avoid memory leaks; normal XML‑configured calls are unaffected because check=true would cause startup failure without a provider.
Problem difficulty ranking: directly traceable via monitoring/code/logs < < stable reproducible < < unstable reproducible < < non‑reproducible; this case falls into the unstable category.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Xiao Lou's Tech Notes
Backend technology sharing, architecture design, performance optimization, source code reading, troubleshooting, and pitfall practices
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
