Databases 21 min read

How Migrating 40 B Records from MySQL to MongoDB Cut Costs and Boost Performance

A large‑scale IoT service migrated 40 billion rows from MySQL to MongoDB, solving capacity, cost, and data‑skew issues while achieving higher performance, lower storage consumption, and a flexible, highly available architecture, with detailed resource evaluation, migration steps, optimization tactics, and cost‑benefit analysis.

dbaplus Community

May 25, 2021

How Migrating 40 B Records from MySQL to MongoDB Cut Costs and Boost Performance

Background

The IoT core service stored ~40 billion rows in MySQL across 64 sharded clusters (256 nodes). Rapid growth caused storage limits, high disk costs, and data‑skew, prompting migration to MongoDB.

Why MongoDB – Core Advantages

Schema‑free : No strict schema, simplifying data modeling.

Native high availability : Replica sets use Raft, eliminating third‑party HA components.

Distributed architecture : Handles sharding and massive data without pre‑splitting tables.

Advanced balancing & shard strategies : Automatic/manual balance, range or hash sharding, multi‑field shard keys.

Configurable consistency : Adjustable read/write concerns and rollback mechanisms.

High concurrency & performance : Optimized thread model and storage engine.

WiredTiger engine : Low latency, high throughput, strong compression (snappy 2.2‑4.5×, zlib 4.5‑7.5×).

Cost savings : Compression reduces disk usage dramatically.

Multi‑datacenter HA : N‑zone active‑active deployment.

Client‑side read routing : Primary, secondary, nearest etc., via readPreference.

Technical details of WiredTiger are available at http://source.wiredtiger.com/3.2.1/architecture.html.

Resource Evaluation & Deployment Architecture

Data volume: ~40 billion rows (~30 TB raw MySQL storage) growing at ~200 billion rows per month. The MongoDB cluster was sized as follows:

4 shards, each with a replica set of 4 nodes.

Node spec: 16 CPU, 64 GB RAM, 7 TB SSD.

mongos & config server combined in a single container (8 CPU, 8 GB RAM, 50 GB disk).

Cluster Deployment

Two‑city deployment using a 2+2+1 pattern (2 mongod + 2 mongod + 1 arbiter) ensures high availability. Each data center runs two mongos proxies with nearest read preference for locality.

Migration Process

The migration used Alibaba’s open‑source DataX tool for full‑load and incremental sync. Key steps:

Identify ssoid as the shard key (hashed) and create a hashed index.

Pre‑split the collection into 8192 chunks to avoid chunk migrations during load:

sh.shardCollection("db.collection", {ssoid:"hashed"}, false, {numInitialChunks:8192})

Additional settings:

Enable nearest read preference per data‑center.

Disable enableMajorityReadConcern because the workload does not require strict majority reads.

Configure WiredTiger cache size to 42 GB (later increased to 55 GB after migration).

Performance Optimizations

During full‑load, high write volume caused dirty page pressure. The team tuned WiredTiger eviction threads:

db.adminCommand({setParameter:1, "wiredTigerEngineRuntimeConfig":"eviction=(threads_min=4, threads_max=20)"})

After migration, cache size was raised to 55 GB and nightly cache release jobs were added to prevent OOM.

Latency measurements (average):

MySQL (300 B rows) – 7 ms.

MongoDB (500 B rows) – 6 ms.

Cost & Benefit Analysis

At a 400 billion‑row baseline:

CPU : Memory cost ratio – MySQL 4 : 1, MongoDB 16 : 1.

Disk cost ratio – roughly 3.3 : 1 in favor of MongoDB due to WiredTiger compression.

Scaling to 1500 billion rows would require 256 MySQL clusters versus a single 4‑shard MongoDB cluster, yielding multi‑fold cost savings (estimated 10× on container billing).

Best Practices & Future Challenges

Operational recommendations for billion‑scale MongoDB clusters:

Build appropriate indexes before loading data; missing indexes degrade query performance.

Choose shard keys carefully – re‑sharding large collections is costly.

Prefer hot‑backup or file‑copy methods over mongodump/mongorestore for massive data.

When replacing nodes, restore from backups rather than full replication to reduce downtime.

Future scaling to trillions of rows will likely require ~20 shards and cold‑data archiving to SATA drives to control storage costs.

Reference implementation and source code are available at https://github.com/y123456yz/reading-and-annotate-mongodb-3.6.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Optimization mysql database migration MongoDB Cost Savings

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.