Databases 13 min read

Why Adding Background Indexes Can Crash a Latency‑Sensitive MongoDB Cluster—and How to Prevent It

A latency‑sensitive MongoDB cluster experienced severe jitter and connection exhaustion when multiple background indexes were added sequentially, revealing how index builds overload replica nodes, trigger alerts, and can be mitigated with proper index‑addition strategies and the noIndexBuildRetry option.

dbaplus Community
dbaplus Community
dbaplus Community
Why Adding Background Indexes Can Crash a Latency‑Sensitive MongoDB Cluster—and How to Prevent It

Business Background

The company stores core revenue‑critical data in a MongoDB cluster. Any latency spike can cause client‑side timeouts, directly impacting revenue. The workload is read‑heavy, read‑write separated, with peak traffic of 80‑100k operations per second and a data size of about 1 billion documents.

Cluster Architecture

The deployment uses a single sharded cluster with one shard consisting of a replica set of five nodes (one primary and four secondaries). This design tolerates two node failures and improves read throughput by directing reads to secondaries.

Cluster topology diagram
Cluster topology diagram
Traffic monitoring curve
Traffic monitoring curve

Problem Discovery

After adding three background indexes sequentially via the management platform, the monitoring system raised latency alerts (>20 ms) on all secondary nodes while the primary remained normal. Connection‑count alerts also appeared, eventually exhausting the max connections and preventing mongo shell access.

Latency alert screenshot
Latency alert screenshot
Connection usage curve
Connection usage curve

Investigation Process

Attempted to connect via mongo shell and received a network error:

MongoDB shell version v3.6.13
connecting to: mongodb://x.x.x.x:20001/test?gssapiServiceName=mongodb
2021-04-29T11:09:15.049+0800 E QUERY    [thread1] Error: network error while attempting to run command 'isMaster' on host x.x.x.x:20001' :
connect@src/mongo/shell/mongo.js:263:13
@(connect):1:6
exception: connect failed

Since the shell could not connect, the team inspected the underlying server logs, which showed that all connections were exhausted.

Connection exhaustion log
Connection exhaustion log

System monitoring revealed extremely high disk I/O on the secondary nodes.

Disk I/O spike
Disk I/O spike

Further analysis of mongod logs confirmed that index builds were consuming the I/O.

Mongod log snippet
Mongod log snippet

Root Cause Confirmation

Adding indexes in background caused each secondary to read the collection data and build the index, generating heavy disk I/O. Because three indexes were being built concurrently on the secondaries, the I/O load spiked, leading to latency spikes and connection‑count exhaustion.

Resolution Steps

Attempted to kill the index‑build operations via killOp, but the connection pool was exhausted.

Killed the mongod process and restarted it; however, the index build resumed automatically.

MongoDB provides the --noIndexBuildRetry flag to skip rebuilding indexes after an unclean shutdown.

mongod -f /home/service/mongodb/conf/mongod_20001.conf --noIndexBuildRetry

Using this flag allowed the secondary to start without re‑executing the interrupted index builds, and the service recovered quickly.

createIndex Core Workflow

When a client issues db.collection.createIndex(..., {background:true}), the primary builds the index, returns OK to the client, writes an oplog entry, and the secondaries replay the oplog to build the index locally.

Primary queries the collection and builds the index.

After completion, the primary sends an OK response.

An oplog entry for the index build is created.

Secondaries fetch the oplog and replay the index build.

Why the Issue Appeared

Because the primary returned OK after its own build finished, the secondaries continued building. When the third index finished on the primary, all three indexes were simultaneously building on the secondaries, overwhelming disk I/O and triggering latency alerts.

Mitigation Strategies for Latency‑Sensitive Workloads

Sequential Index Completion : Ensure that an index is fully built on all secondaries before starting the next one. Newer MongoDB versions serialize background index builds on secondaries, reducing contention.

Isolated Index Build : Remove a secondary from the replica set, start it as a standalone node, build the index without the background flag (faster), then re‑add it to the replica set.

Both methods allow index addition without noticeable impact on the production workload.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

performanceindexingLatencyClusterMongoDB
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.