Databases 13 min read

Which MySQL‑to‑Elasticsearch Sync Method Wins? 4 Solutions & Tool Picks

The article compares four MySQL‑to‑Elasticsearch synchronization approaches—synchronous dual‑write, asynchronous dual‑write via MQ, timer‑based SQL extraction, and binlog‑based real‑time sync—evaluates their trade‑offs, and recommends practical tools such as Canal, DTS, and Databus for implementation.

Architect
Architect
Architect
Which MySQL‑to‑Elasticsearch Sync Method Wins? 4 Solutions & Tool Picks

Introduction

MySQL is often used as the primary business database while Elasticsearch (ES) serves as a query engine to achieve read/write separation and relieve query pressure on MySQL. Keeping the two stores consistent is a critical challenge.

Data‑sync strategies

Synchronous dual‑write

When data is written to MySQL, the same payload is immediately written to ES.

Advantages : business logic is simple; real‑time consistency.

Disadvantages : every MySQL write point must be hard‑coded to also write ES, creating tight coupling; risk of data loss if one write fails; overall performance degrades because the application now bears the cost of two writes.

Asynchronous dual‑write via message queue

Multiple data sources write to MySQL and a message queue (MQ) forwards the changes to ES.

Advantages : higher throughput; MQ guarantees delivery, so failures of ES or write retries are handled; source isolation makes it easy to add new data sources.

Disadvantages : still requires consumer code for each new source; system complexity rises with the introduction of a message broker; asynchronous consumption adds latency before data appears in ES.

Timer‑based SQL extraction

To avoid code intrusion, a timestamp column is added to relevant tables. A scheduled timer scans the tables, extracts rows whose timestamp changed, and writes them to ES.

Add a timestamp column to the target tables; any CRUD operation updates this column.

Leave existing CRUD code untouched.

Run a timer program at a fixed interval to query the tables for rows whose timestamp changed during the window.

Write each changed row to ES.

Advantages : no changes to existing application code; no tight coupling; worker logic is straightforward.

Disadvantages : timeliness suffers because the timer runs at a fixed frequency (even a second‑level interval still introduces delay); polling adds load on the database, which can be mitigated by querying a replica.

Classic solution: use Logstash to periodically run a SQL query for new rows and push the incremental data to ES.

Binlog‑based real‑time sync

This approach leverages MySQL’s binary log (binlog) to capture every data change.

Read the binlog and filter events for the target tables.

Transform each binlog event into a message and push it to an MQ.

Implement an MQ consumer that receives the messages.

For each consumed message, write the change to ES.

Advantages : no code intrusion or hard‑coding; the existing system remains unchanged; high performance; business logic is completely decoupled from the sync pipeline.

Disadvantages : building a reliable binlog ingestion system is complex; if an MQ is used, it re‑introduces the latency risk described in the asynchronous solution.

Binlog sync tool selection

Canal

Canal pretends to be a MySQL slave, subscribes to the master’s binlog, parses the byte stream into JSON, and pushes the data to ES via TCP or MQ.

Canal server initiates a dump protocol request to the MySQL master.

The master streams binlog data; Canal parses it into JSON.

Canal client listens (TCP or MQ) and forwards the parsed events to ES.

The core pipeline includes a Binlog Parser (extracts and converts) and an EventSink (filters, routes, and enriches data).

Alibaba Cloud DTS

Data Transmission Service (DTS) supports migration, real‑time subscription, and sync across RDBMS, NoSQL, and OLAP sources.

Supports multiple source types.

Offers migration, real‑time subscription, and sync modes.

Performance: up to 70 MB/s throughput and 200 k TPS during peak full‑load migration.

High availability via clustered services; automatic failover.

Documentation: https://help.aliyun.com/product/26590.html

LinkedIn Databus

Databus (open‑sourced by LinkedIn in 2013) captures change logs from databases (MySQL, Oracle) and delivers them with millisecond latency.

Supports multiple sources and high scalability (thousands of consumers).

Preserves transaction order and integrity.

Provides server‑side filtering and unlimited replay for consumers.

GitHub repository: https://github.com/linkedin/databus

Other relevant tools

Flink : stateful stream processing engine for bounded and unbounded data streams.

CloudCanal : commercial data migration product (official site: https://www.clougence.com).

Maxwell : lightweight daemon that outputs binlog changes as JSON without requiring a custom client (http://maxwells-daemon.io).

DRD : Alibaba’s distributed database middleware focusing on scalability and lightweight operation (https://www.aliyun.com/product/drds).

yugong : tool for migrating data from Oracle to MySQL (https://github.com/alibaba/yugong).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ElasticsearchmysqlBinlogCanalDTSdata-syncDatabus
Architect
Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.