How Qunar Built a High‑Availability, Multi‑Version Elasticsearch Sync Platform
This article details Qunar's design and implementation of a data synchronization platform that aggregates MySQL data into Elasticsearch, supporting parallel ES5.x/7.x clusters, hot‑swap upgrades, flexible scaling, and end‑to‑end consistency with high availability mechanisms.
Platform Overview
The data synchronization platform aggregates MySQL data into Elasticsearch (ES) and provides a unified query gateway for complex after‑sale queries on orders, tickets, PNRs, itineraries, and airlines.
Architecture
Data Sync Module : Uses Alibaba Otter to capture MySQL binlog, Kafka as the message queue, and a custom Data Transfer Service (DTS) for mapping and filtering.
Data Middle‑Platform (Crab) : Java‑based crab-client abstracts ES DSL, handles read/write, unified authentication, Hystrix circuit breaking, and traffic routing by appcode+ES index.
Management Platform : Optional UI for configuring sync jobs, managing DTS nodes, and handling ES cluster authentication, rate limiting, and traffic distribution.
Motivation for ES 7.x Migration
By Q2 2021 the platform served 10+ business lines and 14+ ES indices. Upgrading from ES 5.x to ES 7.x revealed four pain points: single‑node ES failure, inflexible sync links, lack of monitoring/circuit breaking, and cross‑index impact during failures. Goals were smooth ES scaling/migration and full‑link high availability.
Key Design and Practices
1. Parallel ES 5.x/7.x Deployment & Hot‑Swap
Elastic Rest Client : Adopted the low‑level Java client compatible with all ES versions.
REST APIs : Implemented version‑aware endpoints for Search, Scroll, Document, and Script APIs.
Query DSL & Scripting : Mapped DSL differences (e.g., match behavior), handled nested type updates, and switched to inline JSON scripts for ES 7.x.
Response Normalization : Added ?track_total_hits=true to ES 7.x queries to obtain total hit counts and reshape the response to ES 5.x style.
Hot‑swap procedure: reindex old cluster → configure crab-gateway → incremental补数 → verification → query cut‑over.
2. Flexible Reindex & Incremental补数
Three补数 strategies:
Reindex : Use ES _reindex API to copy full data.
Canal Position Shift : Adjust Canal offsets to replay recent binlog for near‑real‑time补数.
Diff补数 Task : Periodically query source DB, compare with ES, and write missing records via Crab.
curl -H "Content-Type: application/json" -XPOST http://ip:port/_reindex -d'{
"source": {"remote": {"host": "http://ip:port"}, "index": "order_info_beta_tts8"},
"dest": {"index": "order_info_beta_tts8"}
}'3. Data Consistency Guarantees
Maintain order of single‑dimension data across the pipeline (Otter → Kafka → DTS → Crab → ES) using consistent partition keys such as db_name+order_id.
Failed writes are routed to a retry Kafka topic; DTS reprocesses them after fetching the latest DB state.
Critical indices run minute‑level diff补数 jobs that reconcile discrepancies.
4. High Availability of the Sync Chain
Otter runs on multiple nodes with pipeline isolation and master‑slave Canal.
Kafka topics have multiple partitions and replicas.
DTS spawns multiple consumer nodes per topic for failover.
Crab uses Hystrix thread pools per appcode+index to isolate load.
ES indices are replicated across clusters; routing is managed by the manager component.
Operational incidents (e.g., ES node failure, Kafka disk full) are mitigated via the hot‑swap补数 approach and manual failover.
Results
After migration, query latency for domestic ticket searches dropped from 68 ms to 21 ms and write latency from 34 ms to 6 ms. The platform now supports seamless ES version upgrades, parallel clusters, and automated补数, while still requiring occasional manual intervention for fault recovery.
Future Work
Planned improvements include configuration‑driven DTS aggregation, automated fault migration, and further reduction of manual operations to enhance scalability and ease of integration for new business lines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
