Databases 4 min read

Various Data Synchronization Architectures for Real-Time Elasticsearch Integration

The article compares five data synchronization approaches—periodic Logstash pulls, synchronous dual writes, asynchronous dual writes with MQ, Canal-based binlog streaming, and a Canal‑MQ hybrid—detailing their architectures, advantages, drawbacks, and suitable scenarios for integrating databases with Elasticsearch.

Full-Stack Internet Architecture

Nov 22, 2020

Various Data Synchronization Architectures for Real-Time Elasticsearch Integration

Solution 1 – Periodic Logstash Pull: Architecture: Database → Logstash → Elasticsearch. Drawbacks: latency due to scheduled reads, increased load on the source database if the interval is short, and higher network transfer cost for large batch syncs. Reference: https://www.cnblogs.com/csts/p/6120644.html

Solution 2 – Synchronous Dual Write: When the business application writes to the database, it simultaneously writes the same data to Elasticsearch. Architecture: Business Application → Database & Elasticsearch. Drawbacks: hard‑coded logic, tight coupling with business code, and poor performance.

Solution 3 – Asynchronous Dual Write with MQ: Introduce a message queue and a data‑sync service. The producer (business system) publishes a message for each transaction; the consumer reads the message and writes to Elasticsearch. Architecture: Business Application → MQ → Sync Service → Elasticsearch. Drawbacks: synchronization logic remains tightly coupled with the business system. Reference: https://blog.csdn.net/lp2388163/article/details/80633190

Solution 4 – Canal Binlog Streaming: Use Alibaba Canal to subscribe to MySQL binlog and push changes to Elasticsearch in real time. Architecture: MySQL → Canal → Elasticsearch. Drawbacks: performance pressure on Canal servers/clients under high concurrency and potential data loss if the Canal client crashes. Reference: https://www.jianshu.com/p/9677ca6ca34e

Solution 5 – Canal + MQ Hybrid: Combine Canal with a message queue to achieve rate‑limiting, peak‑shaving, and buffering. Canal streams binlog to MQ; a sync service consumes MQ messages and writes to Elasticsearch. Architecture: MySQL → Canal → MQ → Sync Service → Elasticsearch. Drawback: increased system complexity. Reference: https://www.cnblogs.com/sanduzxcvbnm/p/11558858.html

Additional notes: Canal’s open‑source version only supports MySQL; for Oracle you may use tools like OGG or DataBus. Various MQ options (ActiveMQ, RabbitMQ, RocketMQ, Kafka) can be selected based on specific business requirements.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend-architecture database Elasticsearch Message Queue Canal Data synchronization Logstash

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.