Databases 6 min read

Analysis and Resolution of MongoDB Sharding Balancer Chunk Migration Failures in Version 3.4.x

A MongoDB client reported severe chunk imbalance and nightly balancer migration failures in a sharded cluster, which were traced to a known bug causing conflicting operations, and the issue was resolved by disabling the balancer for the affected collection and upgrading the cluster to version 3.4.11 or later.

Aikesheng Open Source Community
Aikesheng Open Source Community
Aikesheng Open Source Community
Analysis and Resolution of MongoDB Sharding Balancer Chunk Migration Failures in Version 3.4.x

Background : A client observed severe chunk distribution imbalance across shards in a MongoDB sharding cluster and nightly balancer migration attempts failed, prompting an investigation.

Impact : Data on each shard was heavily skewed, preventing automatic rebalancing.

Environment : MongoDB 3.4.9 with three mongos, three config servers, and three shard replica sets named shard1, shard2, and shard3.

Diagnosis Process : The sh.status() output revealed that collections db01_xxx.col01_xxxx_info_2019 and db01_xxx.col01_xxxx_info had highly uneven chunk counts, with the balancer attempting to move chunks from the most loaded shard to the least loaded shard.

Log analysis showed errors such as:

2019-05-27T00:04:06.140+0800 I SHARDING [Balancer] Balancer move db01_xxx.col01_xxxx_info_2019: [{ col01_column_1: "3177000047924787", sharedDate: new Date(1546561546000) }, { billingContractNo: "3177000049293528", sharedDate: new Date(1548383450000) }], from shard2, to shard1 failed :: caused by :: ConflictingOperationInProgress: Unable to start new migration because this shard is currently donating chunk [{ col01_column_1: "3177000525560227", sharedDate: new Date(1527215797000) }, { col01_column_1: "3177000525560227", sharedDate: new Date(1527217436000) }) for namespace db01_xxx.col01_xxxx_info to shard3

The failures were due to conflicting automatic migrations: one collection’s chunk migration conflicted with another’s, leading to ConflictingOperationInProgress errors.

This behavior matches a known MongoDB bug (SERVER-29423) where multiple collections cannot simultaneously act as source and destination for balancer migrations.

Fix Verification : The issue can be temporarily mitigated by disabling the balancer for the problematic collection:

// Disable automatic balancing
sh.disableBalancing("db01_xxx.col01_xxxx_info");

// Enable automatic balancing
sh.enableBalancing("db01_xxx.col01_xxxx_info");

Conclusion : The balancer initiates migrations when shard counts change (e.g., after removeShard ) or when chunk count differences exceed configured thresholds (<20 chunks → threshold 2, <80 → 4, >=80 → 8). The observed bug is fixed in MongoDB 3.4.11 and later (including 3.6), so upgrading the cluster resolves the issue permanently.

Related Links :

SERVER-29423 – Sharding balancer schedules multiple migrations with the same conflicting source or destination

MongoDB Sharding Balancer Administration – Migration Thresholds

ShardingVersion UpgradeMongoDBBalancerChunk MigrationDatabase Bug
Aikesheng Open Source Community
Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.