Why and How to Migrate from MongoDB to Elasticsearch: A Practical Guide
This article explains the motivations for moving a high‑volume operation‑log system from MongoDB to Elasticsearch, outlines the existing architecture, details capacity planning, index design, and a step‑by‑step migration process using Kafka, DataX, and Spring Boot, and shares the performance gains and lessons learned.
Background
The author, an experienced Elastic‑Stack user and ES certified engineer, describes a logistics‑fast‑delivery system that originally stored daily operation‑log records in MongoDB. The logs contain both master data (who did what, when, where) and detailed change data, generating billions of records.
Existing Architecture
Data flow before migration:
Business system writes new or edited records to MySQL.
Canal monitors MySQL binlog, filters tables per module, and forwards changes to a Kafka cluster using dataId as the key.
The log‑record service consumes from Kafka, writes master records to MongoDB and also stores a copy for reverse lookup.
Why Switch to Elasticsearch
MongoDB’s B‑Tree index requires query fields to follow the left‑most order, which fails for the highly flexible, multi‑condition log queries.
Both master and detail records need exact match and full‑text search; MongoDB performs poorly and often times out.
Elasticsearch offers superior query speed, scalable sharding without fixed node bindings, and better handling of large‑scale time‑series data.
Operational costs drop dramatically: a 15‑node MongoDB cluster can be replaced by a 3‑node Elasticsearch cluster.
Capacity Planning for Elasticsearch
Assuming 1 billion documents in a MongoDB collection, a test sync of 1 million records occupies ~10 GB. Scaling to production suggests roughly 1 TB of disk space. The proposed Elasticsearch cluster uses three servers with 8 CPU, 16 GB RAM, and 2 TB HDD each.
Index Design
Because logs are time‑series, primary indices are created per month, with non‑core data indexed yearly. Queries must include a time range so the backend can determine which indices to search. Elasticsearch’s multi‑index query capability handles cross‑month aggregation.
Core Implementation Logic
Log records are consumed from Kafka in order. Two scenarios must be handled:
Master data arrives before detail data – the system must assemble a complete record before indexing.
Detail data arrives first – the system must later associate it with the master record.
Because dataId and traceId are not unique, update_by_query cannot be relied upon. Instead, a temporary Elasticsearch index is used as a cache, with _id = dataId + traceId, storing an array of detailId values.
{
"dataId": 1,
"traceId": "abc",
"moduleCode": "crm_01",
"operationId": 100,
"operationName": "张三",
"departmentId": 1000,
"departmentName": "客户部",
"operationContent": "拜访客户",
"detailId": [1,2,3,4,5,6]
}Key Elasticsearch APIs used during migration: _mget – bulk fetch of detail records. bulk – bulk insert of transformed documents. _delete_by_query – clean up the temporary cache index after migration.
Migration Process
1. Data Migration – DataX is chosen as the ETL tool because the log data is historical, the migration is one‑off, and the volume (tens of billions of rows) requires parallelism and custom transformations (date conversion, _id generation, duplicate handling).
2. Index Settings Adjustment – Temporary settings speed up bulk loading:
"index.number_of_replicas": 0,
"index.refresh_interval": "30s",
"index.translog.flush_threshold_size": "1024M",
"index.translog.durability": "async",
"index.translog.sync_interval": "5s"After the data load, the original settings are restored.
3. Application Migration – The Spring Boot log service is extended with two flags:
writeflag.mongodb: true
writeflag.elasticsearch: trueDuring the cut‑over, both MongoDB and Elasticsearch are written (dual‑write). Once verification shows no discrepancies, the MongoDB flag is turned off.
Results and Lessons Learned
Replacing MongoDB with Elasticsearch reduced the server count from 15 to 3, cutting infrastructure costs dramatically. Query performance improved by more than tenfold, and the system now supports flexible, high‑performance searches. The migration required careful handling of data ordering, index design, and temporary caching, but the final outcome validated Elasticsearch’s strengths for large‑scale log analytics.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
