5 Proven Ways to Sync MySQL Data to Elasticsearch
This article explains why synchronizing MySQL with Elasticsearch is beneficial and compares five practical solutions—dual‑write, scheduled tasks, binlog, Canal, and MQ asynchronous—detailing their implementation, advantages, disadvantages, and suitable scenarios for each.
Introduction
Many developers encounter slow database queries, especially fuzzy searches and complex aggregations, and turning to Elasticsearch (ES) as a search engine can be an effective remedy.
This article presents five common approaches for synchronizing MySQL data to Elasticsearch.
Why Sync MySQL to ES?
Full‑text search : ES provides powerful full‑text capabilities far beyond MySQL's LIKE.
Complex aggregations : ES supports advanced aggregation queries suitable for big‑data analysis.
High‑performance queries : ES’s inverted index delivers extremely fast search.
Horizontal scaling : ES is natively distributed, making scaling straightforward.
Solution 1: Dual‑Write
Write to both MySQL and ES directly in business code.
Example Code
@Service
public class UserService {
@Autowired
private UserMapper userMapper;
@Autowired
private ElasticsearchTemplate elasticsearchTemplate;
@Transactional
public void addUser(User user) {
// Write to MySQL
userMapper.insert(user);
// Write to Elasticsearch
IndexQuery indexQuery = new IndexQueryBuilder()
.withObject(user)
.withId(user.getId().toString())
.build();
elasticsearchTemplate.index(indexQuery);
}
@Transactional
public void updateUser(User user) {
// Update MySQL
userMapper.updateById(user);
// Update Elasticsearch
IndexRequest request = new IndexRequest("user_index")
.id(user.getId().toString())
.source(JSON.toJSONString(user), XContentType.JSON);
elasticsearchTemplate.getClient().index(request, RequestOptions.DEFAULT);
}
}Pros
Simple to implement, no extra components needed.
Real‑time synchronization.
Cons
Hard to guarantee data consistency; distributed transaction issues.
Business logic becomes intrusive and complex.
Performance impact as each write waits for ES response.
Suitable Scenarios
Small data volumes, high real‑time requirements, and tolerance for occasional inconsistency.
Solution 2: Scheduled Task
Periodically scan MySQL for changes and sync to ES.
Example Code
@Component
public class UserSyncTask {
@Autowired
private UserMapper userMapper;
@Autowired
private UserESRepository userESRepository;
// Run every 5 minutes
@Scheduled(fixedRate = 5 * 60 * 1000)
public void syncUserToES() {
Date lastSyncTime = getLastSyncTime();
List<User> updatedUsers = userMapper.selectUpdatedAfter(lastSyncTime);
for (User user : updatedUsers) {
userESRepository.save(user);
}
updateLastSyncTime(new Date());
}
private Date getLastSyncTime() { /* retrieve from DB or Redis */ }
private void updateLastSyncTime(Date time) { /* store time */ }
}Pros
Simple, no changes to existing business code.
Database load is controllable by adjusting sync frequency.
Cons
Low real‑time; data lag.
Potential data loss if the system crashes.
Full‑table scans may stress the database.
Suitable Scenarios
Use cases where real‑time is not critical and data changes are infrequent.
Solution 3: Binlog Sync
Parse MySQL binary logs (binlog) to capture every data change.
Example Code
public class BinlogSyncService {
public void startSync() {
BinaryLogClient client = new BinaryLogClient("localhost", 3306, "username", "password");
client.registerEventListener(event -> {
EventData data = event.getData();
if (data instanceof WriteRowsEventData) {
// handle insert
} else if (data instanceof UpdateRowsEventData) {
// handle update
} else if (data instanceof DeleteRowsEventData) {
// handle delete
}
});
client.connect();
}
private void processInsertEvent(WriteRowsEventData data) { /* sync to ES */ }
private void syncToElasticsearch(User user, String op) { /* implementation */ }
}Pros
High real‑time, near‑instant sync.
No intrusion into business code.
Good performance, minimal impact on MySQL.
Cons
Complex implementation; need to parse binlog format.
Compatibility concerns with binlog format changes.
Failover may require re‑synchronization.
Suitable Scenarios
Large data volumes with strict real‑time requirements.
Solution 4: Canal
Canal is Alibaba’s open‑source binlog subscription component that simplifies binlog handling.
Example Configuration
# canal.properties
canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=username
canal.instance.dbPassword=password
canal.instance.connectionCharset=UTF-8
canal.instance.filter.regex=.*\..*Example Code
public class CanalClientExample {
public static void main(String[] args) {
CanalConnector connector = CanalConnectors.newSingleConnector(
new InetSocketAddress("127.0.0.1", 11111), "example", "", "");
try {
connector.connect();
connector.subscribe(".*\\..*");
while (true) {
Message message = connector.getWithoutAck(100);
long batchId = message.getId();
if (batchId != -1 && !message.getEntries().isEmpty()) {
processEntries(message.getEntries());
connector.ack(batchId);
}
Thread.sleep(1000);
}
} finally {
connector.disconnect();
}
}
private static void processEntries(List<CanalEntry.Entry> entries) {
for (CanalEntry.Entry entry : entries) {
if (entry.getEntryType() == CanalEntry.EntryType.ROWDATA) {
CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
// handle INSERT, UPDATE, DELETE
}
}
}
}
}Pros
High real‑time, low latency.
Non‑intrusive to business systems.
Active open‑source community.
Cons
Requires deploying and maintaining a Canal server.
Needs handling of network partitions and recovery.
Potential duplicate sync issues.
Suitable Scenarios
Big data volumes with high real‑time needs and a dedicated team for middleware maintenance.
Solution 5: MQ Asynchronous
Use a message queue to decouple MySQL and ES, improving reliability and scalability.
Example Code
@Service
public class UserService {
@Autowired
private UserMapper userMapper;
@Autowired
private RabbitTemplate rabbitTemplate;
@Transactional
public void addUser(User user) {
userMapper.insert(user);
rabbitTemplate.convertAndSend("user.exchange", "user.add", user);
}
@Transactional
public void updateUser(User user) {
userMapper.updateById(user);
rabbitTemplate.convertAndSend("user.exchange", "user.update", user);
}
}
@Component
public class UserMQConsumer {
@Autowired
private UserESRepository userESRepository;
@RabbitListener(queues = "user.queue")
public void processUserAdd(User user) { userESRepository.save(user); }
@RabbitListener(queues = "user.queue")
public void processUserUpdate(User user) { userESRepository.save(user); }
@RabbitListener(queues = "user.queue")
public void processUserDelete(Long userId) { userESRepository.deleteById(userId); }
}Pros
Complete decoupling; MySQL and ES operate independently.
High availability with MQ persistence and retry.
Scalable – easy to add more consumers.
Cons
Increased system complexity; need to maintain MQ cluster.
Potential message ordering issues.
Data consistency delayed, depends on consumer speed.
Suitable Scenarios
Large distributed systems that require high reliability and extensibility.
Comparison of the Five Solutions
Each method varies in real‑time capability, data consistency, system complexity, performance impact, and ideal use cases. Dual‑write offers the highest real‑time but hardest consistency; scheduled tasks are simple with eventual consistency; binlog and Canal provide high real‑time with moderate to high complexity; MQ adds reliability at the cost of added infrastructure.
Selection Guidance
For small or startup projects, choose dual‑write or scheduled tasks for simplicity.
For medium to large systems, prefer Canal or MQ asynchronous to ensure reliability and scalability.
When handling massive data with strict real‑time needs, binlog or Canal are optimal.
If existing MQ infrastructure is available, the MQ asynchronous approach maximizes resource reuse.
Key Considerations
Idempotency : Ensure sync operations are idempotent to avoid duplicate data.
Monitoring & Alerts : Build monitoring to detect sync delays or failures.
Data Validation : Periodically verify consistency between MySQL and ES.
Fault Tolerance : Design recovery mechanisms to prevent data loss.
Conclusion
Synchronizing MySQL to Elasticsearch is a common requirement in modern applications. Selecting the appropriate solution—whether dual‑write, scheduled tasks, binlog, Canal, or MQ asynchronous—depends on data volume, real‑time needs, system complexity, and existing infrastructure. Understanding each approach’s principles and trade‑offs enables informed technical decisions.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Su San Talks Tech
Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
