Databases 14 min read

MongoDB Storage Engine Evolution, Replication Enhancements, and New Features

The article reviews MongoDB's storage engine developments—including MMAP, WiredTiger, RocksDB, and Memory engines—highlights replication protocol improvements, automatic sharding changes, and introduces new features such as batch and partial indexes, document validation, and join capabilities, while providing performance test results and a Q&A section.

High Availability Architecture
High Availability Architecture
High Availability Architecture
MongoDB Storage Engine Evolution, Replication Enhancements, and New Features

Editor’s note: This high‑availability architecture article, originally shared by Bi Hongyu, is reproduced with attribution to the ArchNotes public account.

Author bio: Bi Hongyu, senior real‑time computing platform engineer at Vipshop, previously DBA at eBay and PPTV, joined Vipshop in 2012 and focuses on databases, big data, and distributed computing.

Storage Engine Development

MongoDB’s architecture was split in version 3.0, separating the server layer from the storage layer and introducing the WiredTiger storage engine.

Supported storage engines now include MMAP, WiredTiger, RocksDB, Memory, and Encryption.

MMAP remains the default in 3.0.x, with the main improvement being a shift from database‑level locks to collection‑level locks; from 3.2 onward, WiredTiger becomes the default.

WiredTiger is the headline feature of MongoDB 3.0. Its key features are:

Document‑level concurrency control, replacing coarse‑grained instance/database locks with fine‑grained per‑document locking, similar to MVCC in RDBMS.

Compression support: Snappy (default) and Zlib.

Performance test (YCSB) showed a reduction of disk usage from 3.12 TB (2.6, 1.18 billion records) to 270 GB after upgrading to 3.0.x.

Benchmark screenshots (write‑only, replica set enabled, read‑only) illustrate throughput changes, noting occasional stalls and reduced primary write throughput when replication is enabled.

WiredTiger’s eviction, checkpoint, and capped‑collection algorithms are still being refined (see JIRA WT‑1788, SERVER‑16736).

RocksDB is a Facebook‑open‑source engine that can be used with MongoDB via the unofficial mongo‑rocks plugin.

WiredTiger also supports an LSM tree option (default is B‑tree); internal parameters can enable LSM, though it may have memory‑leak bugs.

Memory engine is an enterprise‑only, MySQL‑like memory engine with table‑level locks (see SERVER‑1153).

Replication Set Improvements

Since version 3.2, the replication protocol has been enhanced:

Election ID accelerates election progress, reducing MTTR.

Heartbeat detection now uses metadata written to the oplog, avoiding a “heartbeat storm”.

Election timeout parameter allows tuning based on network conditions.

Read Concern was added, enabling strong‑consistent reads without relying solely on read preference.

Automatic Sharding Mechanism

The biggest sharding improvement is that the Config Server now supports replica‑set mode (since 3.2), providing majority write concern and primary read preference for consistency and high availability.

Other New Features

Batch Index : createIndexes now performs a single full‑table scan for multiple index creations, improving efficiency over the pre‑3.0 behavior.

Partial Index : Allows indexing only a subset of documents using partialFilterExpression . Example:

createIndex(
  {isDone : 1, add_time : 1},
  { partialFilterExpression : {isDone : 1} }
)

Document Validator : Provides schema‑like validation using $type and $exists, similar to RDBMS check constraints.

Join (Lookup) : Introduced as an enterprise feature in 3.2 and later open‑sourced; implements left outer join functionality.

Q & A

1. GIS support vs. PostgreSQL/Solr: Choice depends on team familiarity; MongoDB is valued for performance and availability in non‑transactional workloads.

2. MongoDB vs. MySQL advantages: Built‑in failover, configurable read/write separation, schema‑free design, auto‑sharding, and evolving features like document validator and join.

3. Ideal scenarios and 2016 outlook: Best for read‑heavy, write‑light workloads; future directions include removing mongos and improving WiredTiger stability.

4. MongoDB vs. Redis: Redis offers stable low latency; MongoDB provides auto‑failover and read/write splitting but may exhibit spikes.

5. NewSQL perspective: Interesting but still emerging; storage remains critical.

6. Performance vs. relational DBs: Simpler design (no transactions) yields higher speed; WiredTiger integration still maturing.

7. MongoDB and ZooKeeper: No integration; MongoDB’s failover uses its own protocol, similar to trends in HBase and Kafka.

Article planning and editing credits: Li Qingfeng, Wang Jie, editor Wang Jie, proofreader Tim Yang. Please attribute to the ArchNotes public account.

performanceIndexingShardingStorage EngineReplicationMongoDBWiredTiger
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.