Databases 16 min read

Mastering MongoDB Sharding: Primary Shards, Metadata, and Balancer Operations

This article explains MongoDB sharded clusters, covering data distribution, primary shard concepts, metadata collections, commands for moving primary shards, managing balancer settings, handling jumbo chunks, and procedures for adding or removing shards, with practical code examples for each operation.

ITFLY8 Architecture Home
ITFLY8 Architecture Home
ITFLY8 Architecture Home
Mastering MongoDB Sharding: Primary Shards, Metadata, and Balancer Operations

In MongoDB (version 3.2.9), a sharded cluster provides horizontal scalability by distributing data across multiple shards, each storing a subset of the dataset without duplication, thereby increasing overall throughput.

Data is divided into chunks, each containing many documents, and MongoDB tracks chunk locations in metadata stored on config servers (typically three identical config servers). The mongos router and sh helper functions allow safe inspection of this metadata.

Queries to any shard retrieve only the subset of a collection stored on that shard; applications connect to mongos, which transparently routes reads and writes to the appropriate shards, presenting the whole dataset as a single logical database.

Primary Shard

Only collections explicitly sharded with sh.shardCollection() are distributed; unsharded collections reside on a primary shard, which by default is the shard where the database was first created. Each database has its own primary shard.

Each database in a sharded cluster has a primary shard that holds all the un‑sharded collections for that database.

For example, in a three‑shard cluster (shard1, shard2, shard3), if a database blog is created on shard1, MongoDB automatically creates a matching blog database on the other shards, with shard1 as the primary shard.

Changing a database's primary shard is done with the movePrimary command:

db.runCommand({ movePrimary : "test", to : "shard0001" })

After moving the primary shard, the config server metadata is updated, but cached router configs become stale. Refresh them with:

db.adminCommand({"flushRouterConfig":1})

Shard Metadata

Do not query config servers directly; instead use mongos or sh helpers. The sh.status() command displays cluster metadata: sh.status() Connect to mongos and view the config database: mongos> use config shards collection stores shard information: db.shards.find() Each shard stores data in the host specified (replica set or standalone):

{
    "_id" : "shard_name",
    "host" : "replica_set_name/host:port",
    "tag" : [shard_tag1, shard_tag2]
}

databases collection holds all databases, sharded or not: db.databases.find() If sh.enableSharding("db_name") was run, the partitioned field is true; the primary field indicates the primary shard.

{
    "_id" : "test",
    "primary" : "rs0",
    "partitioned" : true
}

collections collection contains metadata for sharded collections (un‑sharded collections are omitted). The shard key is stored in the key field.

db.collections.find()
{
    "_id" : "test.foo",
    "lastmodEpoch" : ObjectId("57dcd4899bd7f7111ec15f16"),
    "lastmod" : ISODate("1970-02-19T17:02:47.296Z"),
    "dropped" : false,
    "key" : { "_id" : 1 },
    "unique" : true
}

chunks collection stores chunk information, including namespace ( ns), min/max shard key values, and the owning shard.

db.chunks.find()
{
    "_id" : "test.foo-_id_MinKey",
    "lastmod" : Timestamp(1,1),
    "lastmodEpoch" : ObjectId("57dcd4899bd7f7111ec15f16"),
    "ns" : "test.foo",
    "min" : { "_id" : 1 },
    "max" : { "_id" : 3087 },
    "shard" : "rs0"
}

changelog records cluster operations such as chunk splits, migrations, and shard additions/removals. The what field indicates the operation type (e.g., multi-split).

"what" : "addShard",
"what" : "shardCollection.start",
"what" : "shardCollection.end",
"what" : "multi-split"

tags collection records shard tags and associated shard‑key ranges:

{
    "_id" : { "ns" : "records.users", "min" : { "zipcode" : "10001" } },
    "ns" : "records.users",
    "min" : { "zipcode" : "10001" },
    "max" : { "zipcode" : "10281" },
    "tag" : "NYC"
}

settings collection holds balancer state and chunk size (default 64 MB):

{ "_id" : "chunksize", "value" : 64 }
{ "_id" : "balancer", "stopped" : false }

locks collection stores a distributed lock ensuring only one mongos instance performs administrative tasks at a time. The balancer acquires this lock by inserting a document like:

{
    "_id" : "balancer",
    "process" : "example.net:40000:1350402818:16807",
    "state" : 2,
    "ts" : ObjectId("507daeedf40e1879df62e5f3"),
    "when" : ISODate("2012-10-16T19:01:01.593Z"),
    "who" : "example.net:40000:1350402818:16807:Balancer:282475249",
    "why" : "doing balance round"
}

Removing Shards

When removing a shard, ensure its data is migrated elsewhere. For sharded collections, use the balancer; for unsharded collections, change the primary shard.

1. Delete data of sharded collections

Step 1: Enable the balancer. sh.setBalancerState(true); Step 2: Migrate all sharded collections to other shards.

use admin
db.adminCommand({"removeShard":"shard_name"})

The removeShard command moves chunks to other shards; large amounts of data may take time.

Step 3: Check migration status.

use admin
db.runCommand({ removeShard: "shard_name" })

The remaining field shows how many chunks or databases are left.

{
    "msg" : "draining ongoing",
    "state" : "ongoing",
    "remaining" : { "chunks" : 42, "dbs" : 1 },
    "ok" : 1
}

Step 4: Confirm completion.

use admin
db.runCommand({ removeShard: "shard_name" })
{
    "msg" : "removeshard completed successfully",
    "state" : "completed",
    "shard" : "shard_name",
    "ok" : 1
}

2. Delete unsharded databases

Step 1: Identify unsharded databases (either partitioned:false or primary:"shard_name").

use config
db.databases.find({$or:[{"partitioned":false},{"primary":"shard_name"}]})

For partitioned:false, all data resides on the current shard; for partitioned:true with a specific primary, unsharded collections must have their primary shard changed.

Step 2: Change the primary shard.

db.runCommand({ movePrimary: "db_name", to: "new_shard" })

Adding Shards

Because each shard stores only a portion of the dataset, high availability is achieved by using a replica set for each shard, even if the set contains a single member. Add a shard via sh helpers: sh.addShard("replica_set_name/host:port") Standalone mongod instances are not recommended as shards:

sh.addShard("host:port")

Jumbo Chunks

Jumbo chunks occur when a chunk grows beyond the configured size because all documents share the same shard key value, preventing automatic splitting and causing uneven distribution.

MongoDB limits each chunk to 250 000 documents or 1.3 × the configured chunk size; the default chunk size is 64 MB.

MongoDB cannot move a chunk that exceeds these limits.

1. View jumbo chunks

Use sh.status(); jumbo chunks are marked with the jumbo flag.

{ "x" : 2 } -->> { "x" : 3 } on : shard-a Timestamp(2, 2) jumbo

2. Distribute jumbo chunks

Step 1: Disable the balancer. sh.setBalancerState(false) Step 2: Temporarily increase the chunk size.

use config
db.settings.save({"_id":"chunksize","value":"1024"})

Step 3: Manually move the jumbo chunk.

sh.moveChunk("db_name.collection_name", {sharded_filed:"value_in_chunk"}, "new_shard_name")

Step 4: Re‑enable the balancer. sh.setBalancerState(true) Step 5: Refresh the router cache.

use admin
db.adminCommand({ flushRouterConfig: 1 })

Balancer

The balancer, running inside mongos, not only routes queries but also balances chunk distribution. Its state can be queried and controlled with sh helpers. sh.getBalancerState() When the command returns true, the balancer is active. It can be stopped with: sh.setBalancerState(false) The balancer acquires a distributed lock stored in config.locks:

use config
db.locks.find({"_id":"balancer"})

If state=2, the balancer is active; state=0 means it is stopped.

Balancing moves or splits chunks to achieve an even number of chunks per shard, though document counts per chunk may still vary. Choosing an appropriate shard key that distributes documents evenly is essential.

Balancer activity can be limited to specific time windows by updating the activeWindow setting:

use config
db.settings.update(
    {"_id":"balancer"},
    {"$set":{"activeWindow":{"start":"23:00","stop":"04:00"}}},
    true)

Source: http://www.uml.org.cn/sjjm/201612082.asp

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingMongoDBChunkBalancerPrimaryShard
ITFLY8 Architecture Home
Written by

ITFLY8 Architecture Home

ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.