Databases 19 min read

Performance Optimization of MongoDB for Million‑Scale Collections Using Shared Table Space Architecture

The Tencent CMongo team analyzed severe memory consumption, slow queries, and hour‑long startup caused by millions of MongoDB collections, identified bottlenecks in WiredTiger’s data‑handle management and schema‑lock handling, and introduced a shared‑table‑space architecture that reduces WT tables to a handful, achieving 1‑2 order‑of‑magnitude speedups and eliminating OOM failures.

Tencent Database Technology
Tencent Database Technology
Tencent Database Technology
Performance Optimization of MongoDB for Million‑Scale Collections Using Shared Table Space Architecture

During operation of Tencent MongoDB services, the CMongo team discovered that when the number of collections and indexes grew to the million‑scale, memory usage surged, queries slowed dramatically, and instance startup could take hours, often leading to OOM crashes.

Root‑cause analysis pinpointed three critical issues in the WiredTiger storage engine: excessive data handle (dhandle) creation, inefficient WT Cache and OS Cache usage, and a schema‑lock contention path that blocked CRUD operations, especially before tables were pre‑warmed.

WiredTiger stores each collection in an independent WT table; the engine manages extents via three linked lists – available list , discard list and allocate list . The dhandle objects are cached globally and per session, and a sweep thread (running every 10 s) closes idle handles only when the close_idle_time threshold is reached. The default value (≈28 h) caused handles to linger, inflating memory and preventing timely cleanup.

By lowering close_idle_time from 10 000 s to 600 s in a 2‑core, 4 GB replica set test, the team observed stable memory usage and no OOM even after creating 10 000 collections, confirming that aggressive handle reclamation mitigates the memory problem.

Read‑write performance analysis revealed that the default hash bucket count (512) leads to long lookup chains in the data‑handle cache when millions of tables exist, dramatically increasing latency. The function __wt_session_get_dhandle was identified as a hotspot.

To break the scaling barrier, the team designed a shared‑table‑space architecture at the KVEngine abstraction layer. By mapping many logical MongoDB collections to a single physical WT table using a prefix scheme stored in __mdb_catalog , the number of WT tables stays constant (nine files such as WiredTiger.wt , _mdb_catalog.wt , collection.wt , etc.), eliminating the data‑handle explosion and the associated lock contention.

Benchmarks after the redesign showed read/write throughput improvements of 1‑2 orders of magnitude, latency reductions across P99, and startup time shrinking from dozens of minutes to under one minute. Real‑world online services (e.g., instant‑messaging) migrated to the new kernel and reported stable performance with dramatically lower memory footprints.

The paper concludes that sharing table space via prefix mapping is an effective, low‑risk way to scale MongoDB to millions of collections, and outlines future work such as extending multi‑threaded initialization and handling remaining feature limitations.

performanceoptimizationscalabilityStorage EngineMongoDBShared Table SpaceWiredTiger
Tencent Database Technology
Written by

Tencent Database Technology

Tencent's Database R&D team supports internal services such as WeChat Pay, WeChat Red Packets, Tencent Advertising, and Tencent Music, and provides external support on Tencent Cloud for TencentDB products like CynosDB, CDB, and TDSQL. This public account aims to promote and share professional database knowledge, growing together with database enthusiasts.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.