Databases 6 min read

Diagnosing Uneven Load in a MongoDB Sharded Cluster Using ExecutionStats

The article recounts a performance incident on a MongoDB 4.2 sharded cluster where one shard’s CPU spiked to 100% during a mail‑sending test, describes the diagnostic steps using monitoring data, slow‑query logs, and executionStats, and concludes with the root cause and remediation.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Diagnosing Uneven Load in a MongoDB Sharded Cluster Using ExecutionStats

Background: The online core service uses a MongoDB RDS (version 4.2) with a four‑shard cluster. During pre‑release load testing of a mail‑sending scenario, shard 3’s primary node CPU reached 100% while the other shards remained normal.

Diagnosis: Monitoring showed shard 3 handling 1300+ slow queries per second, with read and update QPS far higher than shard 4. Insert traffic was similar across shards, ruling out insert skew. Slow‑query logs (hundreds of MB) revealed many large update statements on the users table, prompting suspicion of data distribution imbalance.

Because the RDS platform does not allow direct shard access, the team explored three approaches: (1) compute the hashed shard key for each pid, (2) copy the data to a new sharded collection and compare distribution, and (3) run a range query with explain("executionStats") on the target pid range.

Solution: The third method proved most effective. Executing

db.users.find({ "pid": { "$gte": 2600001, "$lt": 2635000 } }).explain("executionStats")

returned roughly equal docs examined on all four shards, confirming uniform distribution of the 35 k records.

Further investigation revealed that a small, non‑sharded table with only three rows was the actual bottleneck, as its primary shard was shard 3 and its logic had been moved to Redis, eliminating the performance issue.

Conclusion: The incident highlighted limitations of the managed RDS platform (no direct shard login, incomplete slow‑query visibility) and the importance of verifying query plans and data distribution when diagnosing uneven QPS across shards.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MongoDBdiagnosticsDatabase operationsExecutionStats
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.