Operations 13 min read

Analyzing and Optimizing ZooKeeper WatchManager Memory Usage

By replacing ZooKeeper’s default WatchManager hash‑set tables with concurrent maps and bitmap‑based structures, the authors cut watch‑related heap usage from several gigabytes to under 12 MB, lowered lock contention, and achieved 5‑6× latency gains, delivering up to 91 % memory reduction and ten‑fold SLA improvement in production clusters.

DeWu Technology
DeWu Technology
DeWu Technology
Analyzing and Optimizing ZooKeeper WatchManager Memory Usage

Background: ZooKeeper is a distributed coordination service widely used (Kafka, task scheduling, Flink etc.). The team observed increasing JVM heap usage leading to OOM in self‑built clusters.

Exploration: A failure case was captured where two nodes were near OOM. Heap dumps showed large memory consumption by childWatches and dataWatches managed by WatcherManager.

WatcherManager stores mappings in two hash tables: WatchTables (ZNode → Watcher set) and Watch2Paths (Watcher → ZNode set). In large‑scale deployments (e.g., 200 k ZNodes, 5 k Watchers) the relationship count can reach 10⁸, consuming several gigabytes of heap.

Unexpected finding: The default implementation uses unsynchronized HashSet and coarse‑grained synchronized methods, causing high lock contention and memory overhead.

Optimization exploration:

Lock optimization – replace synchronized blocks with ConcurrentHashMap and ReadWriteLock.

Storage optimization – replace HashSet‑based tables with bitmap‑based structures (BitHashSet, BitMap).

Logic optimization – leverage concurrent data structures to achieve O(1) add/remove/trigger operations.

Code changes (excerpt):

WatchManager.java:
private final Map
> watchTable = new HashMap<>();
private final Map
> watch2Paths = new HashMap<>();

WatchManagerOptimized.java:
private final ConcurrentHashMap
pathWatches = new ConcurrentHashMap<>();
private final BitMap
watcherBitIdMap = new BitMap<>();

Performance benchmarks (JMH) show the optimized version reduces memory from ~5.9 GB to ~11.7 MB and improves operation latency by 5‑6×.

Capacity tests on a 3‑node ZooKeeper 3.6.4 cluster (32 C, 60 G) compare default and optimized WatchManager under two scenarios: 200 k short‑path ZNodes and 200 k long‑path ZNodes. The optimized version consistently lowers watch memory usage, election time, fsync time, and overall latency.

Gray‑scale upgrades on three production clusters confirm memory reduction (up to 91 %) and latency improvements (election time ↓ 60‑64 %, max latency ↓ 53‑95 %).

Conclusion: WatchManagerOptimized dramatically cuts memory footprint and improves stability, leading to a ten‑fold SLA improvement for ZooKeeper deployments.

Recommendations: use separate disks for dataDir and dataLogDir, appropriate JDK/G1/ZGC settings, increase SnapshotCount, and enable the optimized WatchManager.

JavaMemory OptimizationPerformance TestingWatchManagerZookeeper
DeWu Technology
Written by

DeWu Technology

A platform for sharing and discussing tech knowledge, guiding you toward the cloud of technology.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.