Databases 13 min read

MongoDB Cluster Point-in-Time Recovery (PITR) Procedure Using Shard and Config Server Restoration

This article demonstrates a step‑by‑step point‑in‑time recovery of a MongoDB sharded cluster by restoring shard instances, replaying oplog entries to a specific timestamp, updating metadata, and finally rebuilding the config server and mongos to achieve a consistent snapshot with max(id)=15.

Aikesheng Open Source Community

Jul 19, 2021

MongoDB Cluster Point-in-Time Recovery (PITR) Procedure Using Shard and Config Server Restoration

1. Background

Most online examples cover PITR for a single‑node MongoDB instance, and the official documentation provides recovery steps for a MongoDB cluster but lacks a concrete PITR workflow. This article presents an experimental setup that simulates an online environment and performs a PITR on a MongoDB sharded cluster.

Original cluster topology

172.16.129.170 shard1 27017 shard2 27018 config 37017 mongos 47017
172.16.129.171 shard1 27017 shard2 27018 config 37017 mongos 47017
172.16.129.172 shard1 27017 shard2 27018 config 37017 mongos 47017

For the demo we restore each shard as a single instance (sufficient for developer queries). The restored topology becomes:

172.16.129.173 shard1 27017 shard2 27018 config 37017 mongos 47017
172.16.129.174 config 37017
172.16.129.175 config 37017

The MongoDB version is Percona 4.2.13. Both each shard and the config server run scheduled hot‑backup scripts and oplog‑backup scripts. Test data is created via mongos by creating a hashed‑sharded collection and inserting 10 documents:

use admin
db.runCommand({"enablesharding":"renkun"})
sh.shardCollection("renkun.user", { id: "hashed" } )
use renkun
var tmp = [];
for(var i =0; i<10; i++){
  tmp.push({ 'id':i, "name":"Kun " + i});
}
db.user.insertMany(tmp);

After taking physical hot‑backups of shard1, shard2 and the config server, another 10 documents are inserted, bringing the total to 20 rows (id 0‑19). The goal is to restore the cluster to the point where max(id)=15.

2. Restoring Shard Instances

2.1 Identify the target snapshot

For each shard we dump the oplog BSON file to JSON and locate the entry that inserted id=15:

bsondump oplog.rs.bson > oplog.rs.json
more oplog.rs.json | egrep "\"op\":\"i\",\"ns\":\"renkun\.user\"" | grep "\"Kun
15\""

The matching entry on shard2 shows a timestamp {"t":1623408268,"i":6}. Because mongorestore --oplogLimit uses an open interval, we add one to the increment, resulting in 1623408268:7 as the effective limit.

2.2 Create a temporary restoration user

Log in as root on each shard and create a user with the __system role, named internal_restore. This user is required for oplog replay, modifying admin.system.version, and dropping the local database.

use admin
db.createUser({ user: "internal_restore", pwd: "internal_restore", roles: ["__system"] })

use local
db.dropDatabase()

2.3 Replay oplog up to the target timestamp

mongorestore -h 127.0.0.1 -u internal_restore -p "internal_restore" --port 27017 \
  --oplogReplay --oplogLimit "1623408268:7" --authenticationDatabase admin \
  /data/backup/202106111849_27017/local/oplog.rs.bson

mongorestore -h 127.0.0.1 -u internal_restore -p "internal_restore" --port 27018 \
  --oplogReplay --oplogLimit "1623408268:7" --authenticationDatabase admin \
  /data/backup/202106111850_27018/local/oplog.rs.bson

Verify the restored data on each shard:

--shard1 27017
> db.user.find().sort({"id":1})
{ "_id" : ObjectId("..."), "id" : 3, "name" : "Kun 3" }
... (records up to id 12)

--shard2 27018
> db.user.find().sort({"id":1})
{ "_id" : ObjectId("..."), "id" : 0, "name" : "Kun 0" }
... (records up to id 15)

After replay, the shards contain 16 documents with the maximum id equal to 15.

2.4 Update admin.system.version

Because the shards have changed from three‑node replica sets to single instances, the metadata must be corrected. Using internal_restore on each shard:

use admin
db.system.version.deleteOne({ _id: "minOpTimeRecovery" })
db.system.version.find({"_id" : "shardIdentity" })
db.system.version.updateOne(
  { "_id" : "shardIdentity" },
  { $set : { "configsvrConnectionString" : "configdb/172.16.129.173:37017,172.16.129.174:37017,172.16.129.175:37017" } }
)

Before deletion the old records still pointed to the original config servers (172.16.129.170‑172.16.129.172).

3. Restoring the Config Server

Transfer the physical backup of the config server, extract it to the data directory, and start it as a single instance. Create the same internal_restore user with the __system role.

3.1 Replay oplog

mongorestore -h 127.0.0.1 -u internal_restore -p "internal_restore" --port 37017 \
  --oplogReplay --oplogLimit "1623408268:7" --authenticationDatabase admin \
  /data/backup/202106111850_37017/local/oplog.rs.bson

3.2 Modify metadata

Log in as internal_restore and update the shard entries in the config.shards collection to point to the restored shard hosts:

use local
db.dropDatabase()
use config
db.shards.find()
db.shards.updateOne({ "_id" : "repl" }, { $set : { "host" : "172.16.129.173:27017" } })
db.shards.updateOne({ "_id" : "repl2" }, { $set : { "host" : "172.16.129.173:27018" } })
db.shards.find()

3.3 Start the config server replica set

Stop the single‑instance config server, then start it in replica‑set mode with the following configuration snippet:

sharding:
  clusterRole: configsvr
replication:
  oplogSizeMB: 10240
  replSetName: configdb

Initialize the replica set and add the two remaining members:

rs.initiate()
rs.add("172.16.129.174:37017")
rs.add("172.16.129.175:37017")

The config server is now a three‑node replica set.

4. Configuring mongos

Copy the original mongos configuration file and adjust the sharding and net.bindIp parameters to reflect the new config server addresses:

sharding:
  configDB: "configdb/172.16.129.173:37017,172.16.129.174:37017,172.16.129.175:37017"
net:
  port: 47017
  bindIp: 127.0.0.1,172.16.129.173

After starting mongos, query the renkun.user collection; it returns 16 documents with max(id)=15, confirming a successful PITR.

mongos> use renkun
switched to db renkun
mongos> db.user.find().sort({"id":1})
{ "_id" : ObjectId("..."), "id" : 0, "name" : "Kun 0" }
... (up to id 15)

5. Summary

MongoDB 4.x introduced multi‑document transactions, and version 4.2 added cross‑shard transactions. Restoring data involved in such transactions requires special tools not covered in this guide; therefore the presented PITR procedure applies to non‑transactional workloads.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backup MongoDB restore Oplog PITR Sharded Cluster

Written by

Aikesheng Open Source Community

The Aikesheng Open Source Community provides stable, enterprise‑grade MySQL open‑source tools and services, releases a premium open‑source component each year (1024), and continuously operates and maintains them.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.