MongoDB Crash Analysis: WT_SESSION Exhaustion Caused by Index Drop During Background Index Build
An original case study examines a MongoDB 4.0.14 replica set crash triggered by exceeding the WiredTiger session limit when a primary creates a background index and subsequently drops it, causing blocking on the secondary, leading to connection saturation, session exhaustion, and instance failure.
Recently a friend encountered a MongoDB instance crash. The primary node successfully created a background index on a large collection and then dropped another index. Shortly after, the secondary node crashed, showing WiredTiger errors such as "out of sessions".
Operation Process
The DBA created an index with:
db.c1.createIndex('name':1,background:true)and then removed an unused index:
db.c1.dropIndex('idx_age')While the primary completed these commands, the secondary began to crash.
Problem Analysis
Log inspection suggested the connection pool was saturated. The maxIncomingConnections and OS ulimit determine the maximum connections. In a test environment, a full connection pool only rejects new connections, it does not crash.
Reference to SERVER-30462 indicated that the WiredTiger session limit (WT_SESSION) might be exceeded.
WT_SESSION is the internal session used by MongoDB Server and the WiredTiger storage engine; exceeding its limit can cause severe failures.
Further investigation showed that WT_SESSION is hard‑coded to 20,000 in mongo/wiredtiger_kv_engine.cpp :
std::stringstream ss;
ss << "create,";
ss << "cache_size=" << cacheSizeMB << "M,";
ss << "cache_overflow=(file_max=" << maxCacheOverflowFileSizeMB << "M),";
ss << "session_max=20000,";
...When the session count exceeds this limit, WiredTiger logs "out of sessions" and the server crashes.
Root Cause
During replication, the secondary was building the background index while the primary simultaneously dropped an index on the same collection. MongoDB documentation warns that dropping an index on a collection while a background index is being replicated to a secondary will block all namespace accesses and halt replication until the background build finishes. This blockage caused a massive accumulation of read requests, exhausting WT_SESSION resources and leading to the crash.
Reproduction Steps
In a test environment, the session limit can be forced low:
mongod -f /etc/mongod.conf --wiredTigerEngineConfigString="session_max=5"Then acquire a global exclusive lock:
mongo> db.fsyncLock()Run a Python script that spawns hundreds of concurrent insert operations (shown in the original article). After unlocking:
mongo> db.fsyncUnlock()All pending operations flood WiredTiger, exceed the session limit, and the instance crashes, reproducing the production issue.
Summary & Recommendations
Configure net.maxIncomingConnections to be lower than the WT_SESSION limit.
Adjust cursor timeout settings to avoid large backlogs.
Avoid executing index creation and index drop back‑to‑back, especially when the index is built in the background.
MongoDB 4.2 removed the background option and improved index build locking; upgrade from 4.0 as it is no longer supported.
References:
WiredTiger Session documentation.
WiredTiger API.
Source code: mongo/wiredtiger_kv_engine.cpp
MongoDB dropIndexes command documentation.
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.