Databases 33 min read

MongoDB High‑Throughput Cluster Optimization: Software, Configuration, and Storage Engine Tuning

This article details how a high‑traffic MongoDB sharded cluster exceeding one million TPS was optimized through software‑level tweaks, configuration changes, WiredTiger storage‑engine tuning, and hardware upgrades, resulting in latency reductions from hundreds of milliseconds to a few milliseconds and significantly improved stability.

Architecture Digest
Architecture Digest
Architecture Digest
MongoDB High‑Throughput Cluster Optimization: Software, Configuration, and Storage Engine Tuning

1. Background

Online cluster peak TPS exceeds 1 million writes per second, average latency over 100 ms, and severe jitter as traffic grows. The cluster uses MongoDB native sharding with balanced data distribution. Node traffic monitoring is shown in the following images.

The cluster reaches over 1.2 M TPS (excluding delete traffic). Including primary‑node delete operations, total TPS exceeds 1.5 M.

2. Software Optimizations

Without adding hardware, the following software‑level optimizations achieved several‑fold performance gains:

Business‑level optimization

MongoDB configuration optimization

Storage‑engine optimization

2.1 Business‑level Optimization

The cluster stores nearly 10 billion documents, each retained for three days. Expired documents generate massive delete load during peak hours. The solution moves delete‑by‑expire to nighttime using an expireAt index:

<span class="hljs-attribute">Db.collection.createIndex({ "expireAt": 1 }, { expireAfterSeconds: 0 })</span>

The index deletes documents exactly at the specified expireAt timestamp, e.g.:

db.collection.insert({
  // document will be deleted at 2022‑07‑22 01:00:00
  "expireAt": new Date('July 22, 2019 01:00:00'),
  "logEvent": 2,
  "logMessage": "Success!"
})

Scheduling expiration at night reduces peak‑hour load, lowering average latency and jitter.

Delete Expire Tips1: Meaning of expireAfterSeconds

expireAt specifies the absolute expiration time, e.g., 2022‑12‑22 02:01.
Db.collection.createIndex({ "expireAt": 1 }, { expireAfterSeconds: 0 })
db.log_events.insert({ "expireAt": new Date(Dec 22, 2019 01:00), "logEvent": 2, "logMessage": "Success!" })
expireAfterSeconds adds a delay after expireAt; e.g., 60 s delay.
db.log_events.insert({ "createdAt": new Date(), "logEvent": 2, "logMessage": "Success!" })
Db.collection.createIndex({ "expireAt": 1 }, { expireAfterSeconds: 60 })
Why does mongostat only show delete on secondary? Because the expiration index runs on the primary, which deletes directly via WiredTiger without passing through the client connection path, so primary delete stats are not recorded.

2.2 MongoDB Configuration Optimization (Network I/O Reuse & Separation)

High TPS and burst traffic cause the default one‑thread‑per‑connection model to overload the system. MongoDB 3.6+ introduces serviceExecutor: adaptive to dynamically adjust network threads and enable I/O reuse.

2.2.1 Internal Network Thread Model

MongoDB creates a dedicated thread per client connection handling both network and disk I/O, which leads to massive thread creation (up to ~10 k threads) and high load during spikes.

Rapid thread creation under high concurrency.

Thread destruction during low‑traffic periods adds further load.

One thread handles both network and storage I/O, a design flaw.

2.2.2 Optimization Method

Enable serviceExecutor: adaptive to reuse network I/O and separate it from disk I/O, reducing thread‑creation overhead and improving throughput.

2.2.3 Performance Comparison Before/After

After enabling adaptive service executor, latency dropped 1‑2× and system load decreased.

2.2.3.1 Load Comparison

2.2.3.2 Slow‑log Comparison

2.2.3.3 Average Latency Comparison

2.3 WiredTiger Storage‑Engine Optimization

Even after service executor tuning, latency remained ~80 ms. Analysis revealed I/O spikes caused by large cacheSize and dirty‑page flushing.

2.3.1 CacheSize Adjustment

CacheSize set to 110 GB caused massive dirty data, leading to I/O saturation. Reducing cacheSize to 50 GB limited simultaneous write bursts.

2.3.2 Dirty‑Page Eviction Tuning

Adjusted eviction thresholds (target 75 %, trigger 97 %, dirty_target 3 %, dirty_trigger 25 %) and increased eviction threads to mitigate write spikes.

eviction_target: 75%

eviction_trigger: 97%

eviction_dirty_target: 3%

eviction_dirty_trigger: 25%

evict.threads_min: 8

evict.threads_max: 12

2.3.3 Checkpoint Optimization

Reduced checkpoint interval to 25 s (log size 1 GB) to limit dirty data accumulation.

checkpoint=(wait=25,log_size=1GB)

2.3.4 IO Comparison Before/After

2.3.5 Latency Comparison Before/After

Latency dropped from ~200 ms to ~20 ms with reduced jitter.

3. Server System Disk‑IO Issue Resolution

3.1 Hardware Background

When WiredTiger flushed large amounts of data, writes >500 MB/s caused util to hit 100 % and write throughput to drop to zero, indicating hardware bottleneck.

NVMe SSD spec shows 2 GB/s write capability, but the server only achieved 500 MB/s.

3.2 Post‑Upgrade Performance

Migrating primary nodes to servers with 2 GB/s SSDs further reduced latency to 2‑4 ms across three business workloads.

Kernel version mismatch was identified as the cause of NVMe under‑performance; upgrading to kernel 3.10.0‑957.27.2.el7.x86_64 restored full SSD throughput.

4. Summary and Open Issues

Combined service‑executor tuning, storage‑engine adjustments, and hardware upgrades lowered average latency from hundreds of milliseconds to 2‑4 ms, achieving multi‑order‑of‑magnitude performance gains. Remaining jitter will be addressed in a future article.

OPPO Internet Technology https://segmentfault.com/a/1190000021268588
performance optimizationShardingMongoDBWiredTigerio-optimizationDatabase Tuning
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.