Databases 16 min read

MongoDB Master Russell Smith’s Essential Best‑Practice Checklist

This article compiles Russell Smith’s comprehensive MongoDB best‑practice guide, covering architecture choices, file limits, write safety, schema design, replication, sharding, security, and performance tuning to help engineers avoid common pitfalls and optimize production deployments.

21CTO
21CTO
21CTO
MongoDB Master Russell Smith’s Essential Best‑Practice Checklist

MongoDB has recently faced criticism on Hack News, prompting many developers to voice frustrations, while MongoDB Master Russell Smith—an Ops consultant for large sites and co‑founder of the MongoDB London User Group—shares his hard‑won insights from operating high‑throughput clusters.

32‑bit vs 64‑bit

MongoDB offers both 32‑bit and 64‑bit builds; the 32‑bit version is limited to 2 GB of data due to memory‑mapped files, so use the 64‑bit version for datasets larger than 2 GB or for sharded deployments.

File size limits

Older MongoDB versions capped individual data files at 4 MB, while newer releases raise this to 16 MB. If this limit becomes a nuisance, consider redesigning your schema or using GridFS for large files, and store massive objects in external services like Amazon S3.

Write failures

By default MongoDB performs asynchronous, “unsafe” writes without immediate error reporting. To verify success, use getLastError or enable safe writes; for a balance of performance and safety, use getLastError with the ‘j’ option to wait for journal acknowledgment.

Data‑model considerations

MongoDB does not enforce a rigid schema, allowing flexible document structures, but a well‑designed schema is still crucial for performance; consult MongoDB’s schema‑design guides for best practices.

Default single‑document updates

Update operations affect only one matching document unless the multi flag is set. Example:

db.people.update({age: {$gt: 30}}, {$set: {past_it: true}}, false, true)

Setting multi:true enables multi‑document updates.

Case‑sensitive queries

MongoDB queries are case‑sensitive by default; use regular expressions with the i flag for case‑insensitive matching, keeping in mind the performance impact.

No input tolerance

Because MongoDB lacks predefined field types, it will store any data you insert; ensure you validate data types before insertion.

Locking

Pre‑2.0 versions used a global write lock; version 2.0 introduced database‑level locks, and later releases added collection‑level locks. Use the latest stable version for optimal concurrency.

Package installation

On Ubuntu/Debian, prefer the official 10gen repository to obtain up‑to‑date MongoDB packages.

Replica set member count

Replica sets require an odd number of voting members; if you need an even count for cost reasons, add an arbiter, which holds no data.

No joins

MongoDB does not support joins; design your schema to denormalize data and reduce the number of queries.

Journaling

Since version 2.0, journaling is enabled by default, reducing data‑loss windows from 60 seconds to 100 ms at a modest (~5 %) performance cost.

Authentication defaults

MongoDB disables authentication by default, assuming a trusted network; enable authentication and firewall rules for production deployments.

Replica‑set data loss

When a member recovers from failure, any unreplicated writes are rolled back and stored in the rollback directory for manual recovery.

Sharding timing

Enable sharding before the cluster reaches ~80 % of its capacity; monitor with tools like MMS, Munin, or CloudWatch.

Shard key immutability

Shard keys cannot be changed after document insertion; to modify, delete and re‑insert the document with a new key.

Sharding large collections

Collections larger than 256 GB cannot be sharded in older versions; shard before reaching this size.

Unique indexes and sharing

Uniqueness constraints must be enforced via the shard key.

Choosing the right shard key

Select an appropriate shard key early, as changing it later is difficult.

Unencrypted communication

By default, MongoDB connections are unencrypted; use SSL/TLS (available in the 10gen/Enterprise builds) for public‑network access.

Transactions

MongoDB provides atomicity only at the single‑document level; multi‑document transactions require application‑level workarounds.

Log pre‑allocation

Log pre‑allocation can be slow on sluggish filesystems; the undocumented --nopreallocj flag can disable it.

NUMA on Linux

Running MongoDB on NUMA hardware is discouraged; disable NUMA to avoid performance regressions.

Process limits on Linux

Increase the open‑file and process limits (e.g., ulimit 4096) to prevent segmentation faults under load.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

shardingperformance tuningbest practicesReplicationsecurityMongoDBdatabase scaling
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.