How Didi Achieved Cross‑Datacenter Elasticsearch Replication for Strong Consistency

This article explains Didi's self‑developed DCDR system that replicates Elasticsearch indices across data‑center clusters, detailing its design goals, core mechanisms, chain construction, historical data recovery, real‑time sync, and data‑quality validation to ensure high availability and strong consistency.

Didi Tech
Didi Tech
Didi Tech
How Didi Achieved Cross‑Datacenter Elasticsearch Replication for Strong Consistency

Background and Goal

Elasticsearch is an open‑source, distributed, RESTful full‑text search engine built on Lucene. It stores every field, scales horizontally across hundreds of nodes, and handles terabytes of data. Didi uses Elasticsearch for core search and logging workloads such as map POI search, order search, and internal ELK pipelines. To improve stability, cost, efficiency, and data‑security, Didi investigated four directions: cross‑data‑center replication for strong consistency and active‑active clusters, upgrading to JDK 17 on Elasticsearch 7.6 for better query performance, enabling ZSTD compression for 5‑10 PB daily writes, and strengthening security authentication. This article focuses on the cross‑data‑center replication solution.

Technical Basics

When a write request arrives, Elasticsearch first writes to the primary shard, then forwards the request in parallel to all replica shards. The primary returns the result after the replicas acknowledge the operation. Replica recovery consists of two stages: (1) the primary sends segment files (committed data) to the replica, and (2) the primary streams the translog (uncommitted operations) to the replica.

DCDR architecture diagram
DCDR architecture diagram
Primary‑replica write flow
Primary‑replica write flow
Replica recovery stages
Replica recovery stages

Design Overview

DCDR (Didi Cross‑Datacenter Replication) treats a follower index shard as a remote replica of a leader index shard. By extending Elasticsearch’s native replica model, DCDR reuses the existing replication logic, minimizing kernel changes.

Remote replica illustration
Remote replica illustration

Design goals:

Guarantee strong consistency between primary and remote replica.

Provide high availability and fast disaster recovery.

Enable zero‑downtime cross‑cluster index migration.

Support reliable version upgrades without rollback.

Solution Design

The solution consists of four main processes.

DCDR link construction : Store link metadata in the cluster state. Links can be defined at the index‑template level or the individual index level.

Historical data recovery : Copy segment files and translog from the leader to the follower, mirroring Elasticsearch’s native replica recovery.

Real‑time data synchronization : After historical sync, forward each write request from the primary shard to the remote replica, ensuring strong consistency at a modest performance cost.

Primary‑replica data quality verification : Periodically compare DCDR metadata with actual cluster state; on mismatch, break the link and trigger incremental recovery.

DCDR workflow
DCDR workflow

DCDR Link Construction

Links are created via a dedicated Elasticsearch API that updates the cluster state. Example metadata structures:

{
  "templates": {
    "templateA_to_ClusterA": {
      "name": "IndexA_to_ClusterA",
      "template": "templateA",
      "replica_cluster": "ClusterA"
    }
  }
}
{
  "Index_202206/Index_202206(ClusterA)": {
    "primary_index": "Index_202206",
    "replica_index": "Index_202206",
    "replica_cluster": "ClusterA",
    "replication_state": true
  }
}
Link creation API
Link creation API

Historical Data Recovery

When a DCDR link is first created or a follower shard fails, the system triggers a recovery that copies both segment files and the translog from the leader to the follower. The flow follows Elasticsearch’s native replica recovery, with the addition of a remote‑replica group in the final stage.

Historical recovery flow
Historical recovery flow

Real‑Time Data Synchronization

After historical sync, each write request is first committed on the primary shard; the same request is then forwarded to the remote replica. This guarantees strong consistency while incurring a modest write‑throughput penalty, acceptable for Didi’s active‑active workloads.

Real‑time sync diagram
Real‑time sync diagram

Failure handling mirrors Elasticsearch’s replica logic: a failed remote replica is removed from the replica group and a periodic task re‑triggers recovery.

Data Quality Verification

A scheduled task checks that DCDR metadata stored in the cluster state matches the actual link status. If the primary‑replica data gap exceeds a threshold or the link becomes abnormal, the primary cluster disconnects the link and initiates incremental recovery on the follower.

Quality check flow
Quality check flow

Operational Impact

To date Didi operates six DCDR follower clusters, with over 400 template links and more than 2,000 index links covering core services such as POI, order, and soda. Remaining challenges include query spikes, cross‑query interference, shard recovery latency, and write‑performance bottlenecks, which are planned for future optimization.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsElasticsearchhigh availabilityData ConsistencyCross‑Datacenter ReplicationDCDR
Didi Tech
Written by

Didi Tech

Official Didi technology account

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.