Databases 30 min read

How OPPO Turned MongoDB from a Niche DB into a Core, Scalable, Multi‑Data‑Center System

This article details OPPO's journey of adopting MongoDB at massive scale, addressing common misconceptions, designing multi‑data‑center active‑active architectures, optimizing thread models and migration processes, and applying performance‑boosting techniques that yielded multi‑fold improvements for trillion‑row clusters.

dbaplus Community

Jan 26, 2021

How OPPO Turned MongoDB from a Niche DB into a Core, Scalable, Multi‑Data‑Center System

Background

When joining OPPO, several high‑traffic services using MongoDB suffered frequent timeouts and jitter, prompting a migration plan to MySQL. After a month of intensive work on the service layer, storage engine, and deployment configurations, jitter was eliminated and MongoDB remained the primary data store.

Active‑Active Multi‑Data‑Center Architecture

Three deployment patterns were evaluated:

Same‑city three‑data‑center (1mongod+1mongod+1mongod) – each data‑center runs at least two proxy instances for high availability; clients use the nearest read preference. Cross‑region writes appear only when data‑centers are geographically separated.

Same‑city two‑data‑center (2mongod+2mongod+1arbiter) – adds an arbiter to reduce election overhead; the same read‑preference behavior applies.

Inter‑region three‑data‑center (1mongod+1mongod+1mongod) – uses tag‑based routing so that writes are directed to the shard whose primary resides in the same region, eliminating cross‑region writes.

Thread‑Model Bottlenecks and Optimizations

The default model creates one thread per client connection, leading to excessive memory and CPU consumption when connections reach 100 k. MongoDB provides an adaptive thread‑pool model that splits each request into tasks (network I/O, disk I/O) placed in a global queue processed by a pool of worker threads. The global lock on the queue becomes a contention point.

Optimized multi‑queue model partitions the global queue by session hash, reducing lock contention and improving throughput.

// Enable adaptive thread pool (pseudo‑code)
setParameter: 1,
"wiredTigerEngineRuntimeConfig": "cache_size=35GB, eviction=(threads_min=4,threads_max=12)"

Parallel Chunk Migration for Cluster Expansion

To expand a 3‑node shard to 6 nodes, the migration workflow is:

Select chunks to move (S = min(M, N)).

Acquire a lock on config.locks for the target collection.

Notify source shards to start migration.

After migration, wait 10 s and repeat.

Key bottlenecks: distributed‑lock contention, a slow chunk on any source shard, and the fixed 10 s pause. Optimizations include disabling auto‑split or increasing chunksize, running migrations for different shards in parallel, and moving the delay into per‑shard logic with a configurable CLI parameter.

Performance‑Boosting Case Studies

Case 1 – Billion‑Row Cluster

Pre‑sharding and write‑load balancing.

WriteConcern {w:"majority"} and read‑write separation.

Disable enableMajorityReadConcern to reduce latency.

Storage‑engine cache tuning:

eviction_target: 75%
eviction_trigger: 97%
eviction_dirty_target: 3%
eviction_dirty_trigger: 25%
evict.threads_min: 4
evict.threads_max: 16

Checkpoint interval reduced to avoid I/O spikes: checkpoint=(wait=30,log_size=1GB) Sharding the system.sessions collection and staggering updates eliminated massive write spikes.

Enable tcmalloc (gperftools) and increase release rate to mitigate page‑heap fragmentation.

Case 2 – Trillion‑Row Cluster

Original schema stored a single characteristic with millions of sub‑documents in an array, causing massive I/O. Redesign steps:

Group sub‑documents under a group array per characteristic (≈100× read reduction).

Add a hashNum field and split the large array into ~500 shards, each holding a manageable number of sub‑documents, staying below the 64 MB document limit.

Result: dramatic latency reduction; write bottlenecks addressed by limiting array size and using $addToSet wisely.

Cost‑Saving Migration Example

To move several hundred billion rows from a high‑IO SSD cluster to a low‑IO SATA cluster:

Copy data to an intermediate SSD cluster.

Transfer from the SSD cluster to the target SATA servers.

Load data into the destination cluster.

Using zlib compression (4.5‑7.5×) instead of the default snappy (2.2‑3.5×) achieved an ~8:1 storage reduction, allowing the workload to run on a handful of SATA nodes instead of hundreds of SSD nodes.

Future Outlook – MongoDB + SQL Fusion

MongoDB 4.2 introduced distributed transactions; 4.4 plans further NewSQL features. To ease developer adoption, a proposal is to extend mongos to translate 5‑10 % of SQL statements into native MongoDB queries, covering ~90 % of use cases while keeping per‑database least‑privilege accounts.

Operational Tools and Practices

mongostat --discover -h ip:port -u user -p pass --authenticationDatabase=admin

– monitors all shard nodes.

Slow‑log analysis via

tail mongod.log -n 1000000 | grep ms | grep COLLSCAN | grep -v "getMore" | grep -v "oplog.rs"

Kill long‑running ops:

db.currentOp({"secs_running":{$gt:5}}).inprog.forEach(function(i){db.killOp(i.opid)})

Session tagging for write routing:

sh.addShardTag('shard01','junior')
sh.addTagRange('test.users', {age: MinKey}, {age:20}, 'junior')

Security: per‑database users, readWrite roles without destructive privileges, IP whitelist/blacklist, audit and traffic‑control (enterprise or Percona MongoDB), hot‑backup instead of mongodump.

Shard‑key design: ensure data dispersion and keep related reads/writes on the same shard; use range sharding for range queries.

Capacity alerts: disk >80 % → add shard; write >35 k ops/s or read >40 k ops/s on SSD → add node; CPU >80 % → scale CPU.

Key Takeaways

Systematic analysis of MongoDB internals and proper configuration can eliminate jitter and scale to trillion‑row workloads.

Active‑active multi‑data‑center designs with tag‑based routing avoid cross‑region writes.

Adaptive multi‑queue thread model and storage‑engine tuning dramatically improve throughput.

Parallel chunk migration and data‑model redesign enable N‑fold cluster expansion.

Compression and hardware‑tier selection provide massive cost savings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

PerformanceOptimization MongoDB DatabaseSecurity CostSaving MultiDataCenter ParallelMigration ThreadModel

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.

Background

Active‑Active Multi‑Data‑Center Architecture

Thread‑Model Bottlenecks and Optimizations

Parallel Chunk Migration for Cluster Expansion

Performance‑Boosting Case Studies

Case 1 – Billion‑Row Cluster

Case 2 – Trillion‑Row Cluster

Cost‑Saving Migration Example

Future Outlook – MongoDB + SQL Fusion

Operational Tools and Practices

Key Takeaways

dbaplus Community

How this landed with the community

Was this worth your time?

0 Comments

Case 1 – Billion‑Row Cluster

Case 2 – Trillion‑Row Cluster

Future Outlook – MongoDB + SQL Fusion