Databases 14 min read

Why and How to Migrate from MongoDB to Elasticsearch: A Practical Guide

This article explains the motivations for moving a high‑volume operation‑log system from MongoDB to Elasticsearch, outlines the existing architecture, details capacity planning, index design, and a step‑by‑step migration process using Kafka, DataX, and Spring Boot, and shares the performance gains and lessons learned.

dbaplus Community

Apr 12, 2020

Why and How to Migrate from MongoDB to Elasticsearch: A Practical Guide

Background

The author, an experienced Elastic‑Stack user and ES certified engineer, describes a logistics‑fast‑delivery system that originally stored daily operation‑log records in MongoDB. The logs contain both master data (who did what, when, where) and detailed change data, generating billions of records.

Existing Architecture

Data flow before migration:

Business system writes new or edited records to MySQL.

Canal monitors MySQL binlog, filters tables per module, and forwards changes to a Kafka cluster using dataId as the key.

The log‑record service consumes from Kafka, writes master records to MongoDB and also stores a copy for reverse lookup.

Why Switch to Elasticsearch

MongoDB’s B‑Tree index requires query fields to follow the left‑most order, which fails for the highly flexible, multi‑condition log queries.

Both master and detail records need exact match and full‑text search; MongoDB performs poorly and often times out.

Elasticsearch offers superior query speed, scalable sharding without fixed node bindings, and better handling of large‑scale time‑series data.

Operational costs drop dramatically: a 15‑node MongoDB cluster can be replaced by a 3‑node Elasticsearch cluster.

Capacity Planning for Elasticsearch

Assuming 1 billion documents in a MongoDB collection, a test sync of 1 million records occupies ~10 GB. Scaling to production suggests roughly 1 TB of disk space. The proposed Elasticsearch cluster uses three servers with 8 CPU, 16 GB RAM, and 2 TB HDD each.

Index Design

Because logs are time‑series, primary indices are created per month, with non‑core data indexed yearly. Queries must include a time range so the backend can determine which indices to search. Elasticsearch’s multi‑index query capability handles cross‑month aggregation.

Core Implementation Logic

Log records are consumed from Kafka in order. Two scenarios must be handled:

Master data arrives before detail data – the system must assemble a complete record before indexing.

Detail data arrives first – the system must later associate it with the master record.

Because dataId and traceId are not unique, update_by_query cannot be relied upon. Instead, a temporary Elasticsearch index is used as a cache, with _id = dataId + traceId, storing an array of detailId values.

{
  "dataId": 1,
  "traceId": "abc",
  "moduleCode": "crm_01",
  "operationId": 100,
  "operationName": "张三",
  "departmentId": 1000,
  "departmentName": "客户部",
  "operationContent": "拜访客户",
  "detailId": [1,2,3,4,5,6]
}

Key Elasticsearch APIs used during migration: _mget – bulk fetch of detail records. bulk – bulk insert of transformed documents. _delete_by_query – clean up the temporary cache index after migration.

Migration Process

1. Data Migration – DataX is chosen as the ETL tool because the log data is historical, the migration is one‑off, and the volume (tens of billions of rows) requires parallelism and custom transformations (date conversion, _id generation, duplicate handling).

2. Index Settings Adjustment – Temporary settings speed up bulk loading:

"index.number_of_replicas": 0,
"index.refresh_interval": "30s",
"index.translog.flush_threshold_size": "1024M",
"index.translog.durability": "async",
"index.translog.sync_interval": "5s"

After the data load, the original settings are restored.

3. Application Migration – The Spring Boot log service is extended with two flags:

writeflag.mongodb: true
writeflag.elasticsearch: true

During the cut‑over, both MongoDB and Elasticsearch are written (dual‑write). Once verification shows no discrepancies, the MongoDB flag is turned off.

Results and Lessons Learned

Replacing MongoDB with Elasticsearch reduced the server count from 15 to 3, cutting infrastructure costs dramatically. Query performance improved by more than tenfold, and the system now supports flexible, high‑performance searches. The migration required careful handling of data ordering, index design, and temporary caching, but the final outcome validated Elasticsearch’s strengths for large‑scale log analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Elasticsearch kafka Database Architecture DataX MongoDB log management

Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.