MongoDB MMAPv1 Storage Engine: Data Organization and Record Management
This article explains how MongoDB's MMAPv1 storage engine organizes databases, namespaces, data files, extents, and records, detailing the structures, write, delete, update, and query processes, and how space reclamation and fragmentation are handled.
Database
Each MongoDB database consists of a .ns file and a series of data files (mydb.0, mydb.1, …) whose sizes start at 64 MB and double up to a maximum of 2 GB.
Namespace
Every database contains multiple namespaces (MongoDB collections). The .ns file is a hash table that maps a namespace name to its metadata; each entry occupies 628 bytes and can store up to 26 715 namespaces in a 16 MB file.
The namespace metadata includes a fixed‑length 128‑byte key, a hash value, and a value structure that holds further details such as DiskLoc pointers to data file offsets.
Data Files
Data files are divided into extents, each belonging to a single namespace and linked together as a doubly‑linked list. The file header stores version, size, free space information, and pointers to the first and last extents.
Extent
Each extent contains multiple records (MongoDB documents) organized as a doubly‑linked list.
Record
A record represents a MongoDB document and begins with a fixed 16‑byte descriptor. Deleted records are stored as DeleteRecord structures that share the first two fields with normal records.
Writing a Record
Check the namespace’s deleted‑record list for a suitable free slot.
If none, look for a free extent in the file’s free list.
If still none, allocate a new extent (or a new data file if needed) and write the record.
Deleting a Record
Deleted records are inserted into the namespace’s deleted‑record list; they may be reused later, but if future writes never match the size class, the space remains fragmented. Running a compact operation can reclaim such fragmentation.
Updating a Record
If the new record is smaller, update in place and possibly add the leftover space to the deleted‑record list.
If larger, treat as delete + insert; the old space becomes a DeletedRecord.
Frequent updates can cause fragmentation; setting appropriate Record Padding can mitigate this.
Querying a Record
Without indexes, a query must scan the entire collection; creating indexes on frequently queried fields improves performance.
Source: Database Kernel Monthly
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
