MongoDB Master Russell Smith’s Essential Best‑Practice Checklist
This article compiles Russell Smith’s comprehensive MongoDB best‑practice guide, covering architecture choices, file limits, write safety, schema design, replication, sharding, security, and performance tuning to help engineers avoid common pitfalls and optimize production deployments.
MongoDB has recently faced criticism on Hack News, prompting many developers to voice frustrations, while MongoDB Master Russell Smith—an Ops consultant for large sites and co‑founder of the MongoDB London User Group—shares his hard‑won insights from operating high‑throughput clusters.
32‑bit vs 64‑bit
MongoDB offers both 32‑bit and 64‑bit builds; the 32‑bit version is limited to 2 GB of data due to memory‑mapped files, so use the 64‑bit version for datasets larger than 2 GB or for sharded deployments.
File size limits
Older MongoDB versions capped individual data files at 4 MB, while newer releases raise this to 16 MB. If this limit becomes a nuisance, consider redesigning your schema or using GridFS for large files, and store massive objects in external services like Amazon S3.
Write failures
By default MongoDB performs asynchronous, “unsafe” writes without immediate error reporting. To verify success, use getLastError or enable safe writes; for a balance of performance and safety, use getLastError with the ‘j’ option to wait for journal acknowledgment.
Data‑model considerations
MongoDB does not enforce a rigid schema, allowing flexible document structures, but a well‑designed schema is still crucial for performance; consult MongoDB’s schema‑design guides for best practices.
Default single‑document updates
Update operations affect only one matching document unless the multi flag is set. Example:
db.people.update({age: {$gt: 30}}, {$set: {past_it: true}}, false, true)Setting multi:true enables multi‑document updates.
Case‑sensitive queries
MongoDB queries are case‑sensitive by default; use regular expressions with the i flag for case‑insensitive matching, keeping in mind the performance impact.
No input tolerance
Because MongoDB lacks predefined field types, it will store any data you insert; ensure you validate data types before insertion.
Locking
Pre‑2.0 versions used a global write lock; version 2.0 introduced database‑level locks, and later releases added collection‑level locks. Use the latest stable version for optimal concurrency.
Package installation
On Ubuntu/Debian, prefer the official 10gen repository to obtain up‑to‑date MongoDB packages.
Replica set member count
Replica sets require an odd number of voting members; if you need an even count for cost reasons, add an arbiter, which holds no data.
No joins
MongoDB does not support joins; design your schema to denormalize data and reduce the number of queries.
Journaling
Since version 2.0, journaling is enabled by default, reducing data‑loss windows from 60 seconds to 100 ms at a modest (~5 %) performance cost.
Authentication defaults
MongoDB disables authentication by default, assuming a trusted network; enable authentication and firewall rules for production deployments.
Replica‑set data loss
When a member recovers from failure, any unreplicated writes are rolled back and stored in the rollback directory for manual recovery.
Sharding timing
Enable sharding before the cluster reaches ~80 % of its capacity; monitor with tools like MMS, Munin, or CloudWatch.
Shard key immutability
Shard keys cannot be changed after document insertion; to modify, delete and re‑insert the document with a new key.
Sharding large collections
Collections larger than 256 GB cannot be sharded in older versions; shard before reaching this size.
Unique indexes and sharing
Uniqueness constraints must be enforced via the shard key.
Choosing the right shard key
Select an appropriate shard key early, as changing it later is difficult.
Unencrypted communication
By default, MongoDB connections are unencrypted; use SSL/TLS (available in the 10gen/Enterprise builds) for public‑network access.
Transactions
MongoDB provides atomicity only at the single‑document level; multi‑document transactions require application‑level workarounds.
Log pre‑allocation
Log pre‑allocation can be slow on sluggish filesystems; the undocumented --nopreallocj flag can disable it.
NUMA on Linux
Running MongoDB on NUMA hardware is discouraged; disable NUMA to avoid performance regressions.
Process limits on Linux
Increase the open‑file and process limits (e.g., ulimit 4096) to prevent segmentation faults under load.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
