How Kernel Optimizations Cut MongoDB Physical Backup Disk Bloat by 90%
MongoDB’s physical backup can cause hidden nodes to balloon in disk usage, but by analyzing WiredTiger’s space‑reclamation process and introducing a checkpoint‑release API, the team reduced backup‑induced disk expansion from up to 200% down to under 10%, saving hundreds of gigabytes and cutting backup time by over 40%.
MongoDB provides two backup methods: logical backup, which scans each document and heavily loads CPU, memory, and network bandwidth, and physical backup, which copies WiredTiger’s *.wt files at the file level. Physical backup is typically 5–10× faster and is the default for large clusters.
A hidden backup node often consumes far more disk space than primary or regular secondary nodes. In a real‑world case, a hidden node grew from ~1 TB to >1.5 TB (≈50% increase) because the space occupied by the backup process was never reclaimed automatically.
The root cause lies in WiredTiger’s space‑reclamation chain. WiredTiger manages data in page units that are stored in contiguous extent regions. It maintains three extent lists:
live.alloc : extents allocated since the last checkpoint.
live.avail : free extents that can be reused for new writes.
live.discard : extents referenced by old checkpoints and therefore not reusable.
During normal operation, when a checkpoint is dropped, extents move from live.discard → ckpt_avail → live.avail, allowing new writes to reuse space and keeping the file size stable.
When a backup cursor is opened, WiredTiger pins the latest checkpoint. All extents referenced by that checkpoint remain in live.discard and cannot be reclaimed. As writes continue, new pages must be appended to the end of the file, causing continuous disk growth. The effect is amplified for the oplog ( local.oplog.rs), a capped collection that is constantly written; its extents are never reclaimed during backup, leading to the worst‑case “double‑crown” bloat.
To stop this growth, the team introduced a new C API WT_CONNECTION::backup_release_checkpoint. After a table is fully copied by the external backup service, it calls releaseBackupCheckpoint(ident), which removes the table’s checkpoint from the pin list, allowing the normal drop process to reclaim its extents.
Three kernel‑side improvements were made:
Fine‑grained checkpoint release : The new API records released tables in a hash set; subsequent checkpoints check this set and skip the pin logic for those tables.
Oplog handling : The backup service immediately releases the oplog checkpoint before any copy begins, preventing its massive growth, and then skips copying oplog.wt altogether. Metadata adjustments ensure that the missing file does not break recovery.
Crash‑recovery sentinel : A sentinel file backup.in_progress is created when a backup starts and removed when it finishes. On startup, mongod checks this file; if it exists, the stale WiredTiger.backup metadata is invalidated, forcing a normal crash recovery instead of a hot‑backup restore that would fail because released checkpoints no longer exist.
In addition, the backup service now schedules tables by estimated expansion rate (high‑expansion tables first) to minimize the “expansion window” for each table. In a test with four tables (hot, warm, append‑only, and cold) the hidden node’s peak disk usage dropped an additional 20‑30%.
Performance results on a 1‑TB cluster:
Backup total time reduced from ~85.1 min to ~44.7 min (‑44.4%).
Hidden node peak usage fell from 850.9 GB to 524.3 GB (‑326.6 GB, 38% saved).
Disk‑size inflation (strict) dropped from 63.9% to 5.8% (‑91%).
COS upload volume decreased from 850.9 GB to ~524.3 GB (‑38.4%).
Backup‑to‑restore bandwidth and time were similarly reduced.
Customer impact: after applying the optimizations, hidden‑node disk bloat for multi‑table clusters fell from ~50% to ~5%, saving roughly 500 GB per node and cutting storage costs by about 60%.
Overall, the kernel‑level changes transform MongoDB’s hot backup from an all‑or‑nothing operation to a table‑level, low‑impact process, dramatically reducing disk consumption, backup duration, and associated costs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
