How to Sync MySQL Data to Elasticsearch: 4 Practical Strategies
This article explores four common approaches for synchronizing product data from MySQL to Elasticsearch in e‑commerce systems—synchronous dual write, asynchronous dual write with message queues, scheduled tasks, and binlog‑based data subscription—detailing their advantages, drawbacks, and implementation considerations.
Hello, I'm Su San. In this article I share a common e‑commerce scenario: synchronizing MySQL data to Elasticsearch.
Product search on e‑commerce sites is typically powered by Elasticsearch. When a product is added, its data is first written to MySQL; the challenge is keeping Elasticsearch in sync.
1. Synchronous Dual Write
The simplest method writes to MySQL and simultaneously writes the same data to Elasticsearch.
Pros: Easy to implement.
Cons:
Business coupling – synchronization code is embedded in product management.
Performance impact – writing to two stores increases latency.
Limited extensibility – hard to support personalized search or data aggregation.
2. Asynchronous Dual Write
Product data is first placed onto a message queue (MQ). A separate search service subscribes to these messages and updates Elasticsearch, decoupling the product service.
For multi‑dimensional queries, data often needs to be aggregated into a wide table before indexing in Elasticsearch to improve query efficiency.
If aggregation is required, the search service may need to query the original database (a “back‑call”).
Pros:
Decouples product service from synchronization.
Good real‑time performance; synchronization usually completes within seconds.
Cons:
Introduces additional components and complexity.
3. Scheduled Tasks
A simple approach is to run periodic jobs that copy data from MySQL to Elasticsearch.
Choosing the right frequency is difficult: high frequency creates load spikes, while low frequency reduces freshness.
Pros: Easy to implement.
Cons:
Hard to guarantee real‑time freshness.
Increases storage load.
4. Data Subscription (Binlog)
Another modern method uses MySQL binlog subscription to achieve master‑slave replication. Frameworks like Canal act as a pseudo‑slave, capturing changes and forwarding them.
Canal provides adapters, including an Elasticsearch adapter, allowing zero‑code synchronization via canal-adapter. However, for complex aggregations, custom client code is still needed.
Compared with asynchronous dual write, data subscription reduces coupling and offers better real‑time performance.
Pros:
Minimal intrusion into business code.
Good real‑time characteristics.
Popular open‑source data subscription tools include:
Canal (Alibaba, Java, active, supports high availability, multiple client languages, Kafka/RocketMQ, custom message format).
Maxwell (Zendesk, Java, active, supports high availability, JSON format, Kafka/RabbitMQ/Redis).
Python‑Mysql‑Replication (Community, Python, active, no high‑availability support, custom format).
Beyond Elasticsearch, similar synchronization techniques apply when replicating MySQL data to other stores such as HBase.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
