Analyzing and Optimizing ZooKeeper WatchManager Memory Usage
By replacing ZooKeeper’s default WatchManager hash‑set tables with concurrent maps and bitmap‑based structures, the authors cut watch‑related heap usage from several gigabytes to under 12 MB, lowered lock contention, and achieved 5‑6× latency gains, delivering up to 91 % memory reduction and ten‑fold SLA improvement in production clusters.
Background: ZooKeeper is a distributed coordination service widely used (Kafka, task scheduling, Flink etc.). The team observed increasing JVM heap usage leading to OOM in self‑built clusters.
Exploration: A failure case was captured where two nodes were near OOM. Heap dumps showed large memory consumption by childWatches and dataWatches managed by WatcherManager.
WatcherManager stores mappings in two hash tables: WatchTables (ZNode → Watcher set) and Watch2Paths (Watcher → ZNode set). In large‑scale deployments (e.g., 200 k ZNodes, 5 k Watchers) the relationship count can reach 10⁸, consuming several gigabytes of heap.
Unexpected finding: The default implementation uses unsynchronized HashSet and coarse‑grained synchronized methods, causing high lock contention and memory overhead.
Optimization exploration:
Lock optimization – replace synchronized blocks with ConcurrentHashMap and ReadWriteLock.
Storage optimization – replace HashSet‑based tables with bitmap‑based structures (BitHashSet, BitMap).
Logic optimization – leverage concurrent data structures to achieve O(1) add/remove/trigger operations.
Code changes (excerpt):
WatchManager.java:
private final Map<String, Set<Watcher>> watchTable = new HashMap<>();
private final Map<Watcher, Set<String>> watch2Paths = new HashMap<>();
WatchManagerOptimized.java:
private final ConcurrentHashMap<String, BitHashSet> pathWatches = new ConcurrentHashMap<>();
private final BitMap<Watcher> watcherBitIdMap = new BitMap<>();Performance benchmarks (JMH) show the optimized version reduces memory from ~5.9 GB to ~11.7 MB and improves operation latency by 5‑6×.
Capacity tests on a 3‑node ZooKeeper 3.6.4 cluster (32 C, 60 G) compare default and optimized WatchManager under two scenarios: 200 k short‑path ZNodes and 200 k long‑path ZNodes. The optimized version consistently lowers watch memory usage, election time, fsync time, and overall latency.
Gray‑scale upgrades on three production clusters confirm memory reduction (up to 91 %) and latency improvements (election time ↓ 60‑64 %, max latency ↓ 53‑95 %).
Conclusion: WatchManagerOptimized dramatically cuts memory footprint and improves stability, leading to a ten‑fold SLA improvement for ZooKeeper deployments.
Recommendations: use separate disks for dataDir and dataLogDir, appropriate JDK/G1/ZGC settings, increase SnapshotCount, and enable the optimized WatchManager.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DeWu Technology
A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
