Databases 20 min read

Analysis of WiredTiger Eviction Cache and Write Hang Issues in MongoDB

This article examines why MongoDB's WiredTiger storage engine experiences intermittent write hangs, detailing the eviction cache design, hazard pointer concurrency, eviction thread model, checkpoint interactions, performance bottlenecks, and practical mitigation strategies.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Analysis of WiredTiger Eviction Cache and Write Hang Issues in MongoDB

The WiredTiger storage engine, introduced in MongoDB 3.0, can suffer intermittent write hangs during high‑throughput workloads, sometimes causing delays of dozens of seconds. The root cause lies in the design of its eviction cache, which uses a segmented LRU scan combined with hazard‑pointer eviction without hierarchical LRU tiers.

WT's eviction cache operates as a page‑level LRU buffer: leader threads periodically scan b‑trees, enqueue evictable pages, and follower threads perform eviction. Eviction passes occur every 100 ms or when memory usage exceeds configured thresholds (80% of cache size or 75% of dirty memory).

Hazard pointers provide lock‑free concurrency by marking pages read by threads; eviction only proceeds when no hazard pointers are set, otherwise the page’s eviction score is increased. This mechanism, together with the leader‑follower thread model, can generate many context switches under heavy write load.

Checkpointing adds further pressure: it writes all dirty pages to the OS cache and then flushes them to disk, competing with eviction for I/O resources. When a checkpoint holds exclusive access to a b‑tree, eviction of its pages is blocked, potentially exhausting the cache and causing write stalls.

Performance measurements show that memory bandwidth far exceeds disk I/O, especially on SSDs versus SATA, leading to I/O saturation during large checkpoint flushes. The article outlines several mitigation approaches: upgrading to WiredTiger 2.8 (which introduces tiered eviction queues and improved checkpointing), enabling direct I/O, reducing cache size, separating redo logs onto a dedicated disk, and using SSDs.

While these measures can alleviate write hangs, they may not eliminate them entirely; ongoing development in WiredTiger continues to address eviction and checkpoint inefficiencies.

PerformanceMongoDBWiredTigerdatabase engineeviction cachehazard pointer
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.