Databases 16 min read

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

This article presents a comprehensive guide to synchronizing heterogeneous data sources with ElasticSearch, migrating clusters across environments, and implementing robust disaster‑recovery solutions for both intra‑city and inter‑city high‑availability scenarios.

Efficient Ops

Jun 16, 2021

Mastering ElasticSearch Data Migration and Disaster Recovery: Practical Strategies

1. Heterogeneous Data Synchronization with ElasticSearch

ElasticSearch can ingest data from relational databases (MySQL, PostgreSQL), document stores (MongoDB), message queues (Kafka, RabbitMQ), and Hadoop ecosystems, as well as archive data to object storage (COS, OSS, S3) to reduce storage costs.

Two main MySQL sync methods are offline (full‑load via Logstash JDBC) and online (real‑time binlog capture using Canal, Mypipe, or Logstash with Kafka). MongoDB can be synced offline with Logstash or online using the OPlog and Monstache, which also supports an initial full load followed by incremental updates.

Log data is typically collected by Logstash, sent to Kafka, and then consumed by Logstash again for ES indexing. Serverless Cloud Functions (SCF) can also consume Kafka topics or object‑storage events to write data into ES with minimal cost.

2. ElasticSearch Cluster‑to‑Cluster Data Migration

Migration is divided into offline (source cluster paused) and online (source remains active) approaches.

Common tools include:

elasticsearch‑dump : a Node.js utility suitable for <10 GB datasets.

Logstash : ideal for data filtering, preprocessing, or migrating across major version gaps.

Reindex API : native ES API for both intra‑ and inter‑cluster migrations, limited to <100 GB for optimal performance.

Snapshot : ES‑provided API to back up index files to COS or HDFS, then restore on the target cluster, suitable for large‑scale migrations.

Cross‑Cluster Replication (CCR) : real‑time replication available in ES 6.5+ with both clusters supporting the feature.

Dual‑write strategies : write simultaneously to two clusters via Kafka and Logstash, or use node dual‑NIC setups to merge source and target clusters before decommissioning the old nodes.

2.1 Offline Migration

For multi‑terabyte datasets, Snapshot is preferred because it backs up all index files directly and restores quickly on the destination cluster.

2.2 Online Migration

Online migration can use a combination of full‑load plus incremental Logstash jobs, dual‑write, or CCR. Dual‑write can be implemented by consuming the same Kafka stream with two Logstash pipelines, ensuring data consistency while both clusters remain online.

CCR requires both clusters to run ES 6.5+; the follower index must have soft_deletes enabled (default in ES 7.x) to allow write operations after failover.

3. ElasticSearch Disaster‑Recovery Practices

3.1 Intra‑city DR

Two main patterns are active‑passive clusters with synchronous dual‑write or asynchronous replication, and a single cluster spanning multiple availability zones (AZs) where primary and replica shards are distributed across AZs.

To avoid split‑brain when network partitions occur, dedicated master nodes are deployed in odd numbers across AZs. Tencent Cloud adds a hidden AZ to ensure a majority of master nodes remain reachable, preventing split‑brain and enabling automatic master election.

3.2 Inter‑city DR

Cross‑region DR typically uses a primary‑secondary topology: a master cluster in one city (e.g., Shanghai) and a follower cluster in another (e.g., Beijing). CCR can replicate indices, but follower indices are read‑only; to write after a failover, the follower must be promoted by disabling following and converting it to a leader index.

After data catch‑up, traffic can be switched back to the primary region, and a new leader index can be created for future replication.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Data Migration Big Data Disaster Recovery Cluster Sync

Written by

Efficient Ops

This public account is maintained by Xiaotianguo and friends, regularly publishing widely-read original technical articles. We focus on operations transformation and accompany you throughout your operations career, growing together happily.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.