Analysis and Resolution of MongoDB Sharding Balancer Chunk Migration Failures in Version 3.4.x
A MongoDB client reported severe chunk imbalance and nightly balancer migration failures in a sharded cluster, which were traced to a known bug causing conflicting operations, and the issue was resolved by disabling the balancer for the affected collection and upgrading the cluster to version 3.4.11 or later.
Background : A client observed severe chunk distribution imbalance across shards in a MongoDB sharding cluster and nightly balancer migration attempts failed, prompting an investigation.
Impact : Data on each shard was heavily skewed, preventing automatic rebalancing.
Environment : MongoDB 3.4.9 with three mongos, three config servers, and three shard replica sets named shard1, shard2, and shard3.
Diagnosis Process : The sh.status() output revealed that collections db01_xxx.col01_xxxx_info_2019 and db01_xxx.col01_xxxx_info had highly uneven chunk counts, with the balancer attempting to move chunks from the most loaded shard to the least loaded shard.
Log analysis showed errors such as:
2019-05-27T00:04:06.140+0800 I SHARDING [Balancer] Balancer move db01_xxx.col01_xxxx_info_2019: [{ col01_column_1: "3177000047924787", sharedDate: new Date(1546561546000) }, { billingContractNo: "3177000049293528", sharedDate: new Date(1548383450000) }], from shard2, to shard1 failed :: caused by :: ConflictingOperationInProgress: Unable to start new migration because this shard is currently donating chunk [{ col01_column_1: "3177000525560227", sharedDate: new Date(1527215797000) }, { col01_column_1: "3177000525560227", sharedDate: new Date(1527217436000) }) for namespace db01_xxx.col01_xxxx_info to shard3The failures were due to conflicting automatic migrations: one collection’s chunk migration conflicted with another’s, leading to ConflictingOperationInProgress errors.
This behavior matches a known MongoDB bug (SERVER-29423) where multiple collections cannot simultaneously act as source and destination for balancer migrations.
Fix Verification : The issue can be temporarily mitigated by disabling the balancer for the problematic collection:
// Disable automatic balancing
sh.disableBalancing("db01_xxx.col01_xxxx_info");
// Enable automatic balancing
sh.enableBalancing("db01_xxx.col01_xxxx_info");Conclusion : The balancer initiates migrations when shard counts change (e.g., after removeShard ) or when chunk count differences exceed configured thresholds (<20 chunks → threshold 2, <80 → 4, >=80 → 8). The observed bug is fixed in MongoDB 3.4.11 and later (including 3.6), so upgrading the cluster resolves the issue permanently.
Related Links :
SERVER-29423 – Sharding balancer schedules multiple migrations with the same conflicting source or destination
MongoDB Sharding Balancer Administration – Migration Thresholds
Aikesheng Open Source Community
The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.