Databases 14 min read

5 Proven Ways to Sync MySQL Data to Elasticsearch

This article explains why synchronizing MySQL with Elasticsearch is beneficial and compares five practical solutions—dual‑write, scheduled tasks, binlog, Canal, and MQ asynchronous—detailing their implementation, advantages, disadvantages, and suitable scenarios for each.

Su San Talks Tech
Su San Talks Tech
Su San Talks Tech
5 Proven Ways to Sync MySQL Data to Elasticsearch

Introduction

Many developers encounter slow database queries, especially fuzzy searches and complex aggregations, and turning to Elasticsearch (ES) as a search engine can be an effective remedy.

This article presents five common approaches for synchronizing MySQL data to Elasticsearch.

Why Sync MySQL to ES?

Full‑text search : ES provides powerful full‑text capabilities far beyond MySQL's LIKE.

Complex aggregations : ES supports advanced aggregation queries suitable for big‑data analysis.

High‑performance queries : ES’s inverted index delivers extremely fast search.

Horizontal scaling : ES is natively distributed, making scaling straightforward.

Solution 1: Dual‑Write

Write to both MySQL and ES directly in business code.

Example Code

@Service
public class UserService {
    @Autowired
    private UserMapper userMapper;
    @Autowired
    private ElasticsearchTemplate elasticsearchTemplate;

    @Transactional
    public void addUser(User user) {
        // Write to MySQL
        userMapper.insert(user);
        // Write to Elasticsearch
        IndexQuery indexQuery = new IndexQueryBuilder()
                .withObject(user)
                .withId(user.getId().toString())
                .build();
        elasticsearchTemplate.index(indexQuery);
    }

    @Transactional
    public void updateUser(User user) {
        // Update MySQL
        userMapper.updateById(user);
        // Update Elasticsearch
        IndexRequest request = new IndexRequest("user_index")
                .id(user.getId().toString())
                .source(JSON.toJSONString(user), XContentType.JSON);
        elasticsearchTemplate.getClient().index(request, RequestOptions.DEFAULT);
    }
}

Pros

Simple to implement, no extra components needed.

Real‑time synchronization.

Cons

Hard to guarantee data consistency; distributed transaction issues.

Business logic becomes intrusive and complex.

Performance impact as each write waits for ES response.

Suitable Scenarios

Small data volumes, high real‑time requirements, and tolerance for occasional inconsistency.

Solution 2: Scheduled Task

Periodically scan MySQL for changes and sync to ES.

Example Code

@Component
public class UserSyncTask {
    @Autowired
    private UserMapper userMapper;
    @Autowired
    private UserESRepository userESRepository;

    // Run every 5 minutes
    @Scheduled(fixedRate = 5 * 60 * 1000)
    public void syncUserToES() {
        Date lastSyncTime = getLastSyncTime();
        List<User> updatedUsers = userMapper.selectUpdatedAfter(lastSyncTime);
        for (User user : updatedUsers) {
            userESRepository.save(user);
        }
        updateLastSyncTime(new Date());
    }

    private Date getLastSyncTime() { /* retrieve from DB or Redis */ }
    private void updateLastSyncTime(Date time) { /* store time */ }
}

Pros

Simple, no changes to existing business code.

Database load is controllable by adjusting sync frequency.

Cons

Low real‑time; data lag.

Potential data loss if the system crashes.

Full‑table scans may stress the database.

Suitable Scenarios

Use cases where real‑time is not critical and data changes are infrequent.

Solution 3: Binlog Sync

Parse MySQL binary logs (binlog) to capture every data change.

Example Code

public class BinlogSyncService {
    public void startSync() {
        BinaryLogClient client = new BinaryLogClient("localhost", 3306, "username", "password");
        client.registerEventListener(event -> {
            EventData data = event.getData();
            if (data instanceof WriteRowsEventData) {
                // handle insert
            } else if (data instanceof UpdateRowsEventData) {
                // handle update
            } else if (data instanceof DeleteRowsEventData) {
                // handle delete
            }
        });
        client.connect();
    }
    private void processInsertEvent(WriteRowsEventData data) { /* sync to ES */ }
    private void syncToElasticsearch(User user, String op) { /* implementation */ }
}

Pros

High real‑time, near‑instant sync.

No intrusion into business code.

Good performance, minimal impact on MySQL.

Cons

Complex implementation; need to parse binlog format.

Compatibility concerns with binlog format changes.

Failover may require re‑synchronization.

Suitable Scenarios

Large data volumes with strict real‑time requirements.

Solution 4: Canal

Canal is Alibaba’s open‑source binlog subscription component that simplifies binlog handling.

Example Configuration

# canal.properties
canal.instance.master.address=127.0.0.1:3306
canal.instance.dbUsername=username
canal.instance.dbPassword=password
canal.instance.connectionCharset=UTF-8
canal.instance.filter.regex=.*\..*

Example Code

public class CanalClientExample {
    public static void main(String[] args) {
        CanalConnector connector = CanalConnectors.newSingleConnector(
                new InetSocketAddress("127.0.0.1", 11111), "example", "", "");
        try {
            connector.connect();
            connector.subscribe(".*\\..*");
            while (true) {
                Message message = connector.getWithoutAck(100);
                long batchId = message.getId();
                if (batchId != -1 && !message.getEntries().isEmpty()) {
                    processEntries(message.getEntries());
                    connector.ack(batchId);
                }
                Thread.sleep(1000);
            }
        } finally {
            connector.disconnect();
        }
    }
    private static void processEntries(List<CanalEntry.Entry> entries) {
        for (CanalEntry.Entry entry : entries) {
            if (entry.getEntryType() == CanalEntry.EntryType.ROWDATA) {
                CanalEntry.RowChange rowChange = CanalEntry.RowChange.parseFrom(entry.getStoreValue());
                for (CanalEntry.RowData rowData : rowChange.getRowDatasList()) {
                    // handle INSERT, UPDATE, DELETE
                }
            }
        }
    }
}

Pros

High real‑time, low latency.

Non‑intrusive to business systems.

Active open‑source community.

Cons

Requires deploying and maintaining a Canal server.

Needs handling of network partitions and recovery.

Potential duplicate sync issues.

Suitable Scenarios

Big data volumes with high real‑time needs and a dedicated team for middleware maintenance.

Solution 5: MQ Asynchronous

Use a message queue to decouple MySQL and ES, improving reliability and scalability.

Example Code

@Service
public class UserService {
    @Autowired
    private UserMapper userMapper;
    @Autowired
    private RabbitTemplate rabbitTemplate;

    @Transactional
    public void addUser(User user) {
        userMapper.insert(user);
        rabbitTemplate.convertAndSend("user.exchange", "user.add", user);
    }

    @Transactional
    public void updateUser(User user) {
        userMapper.updateById(user);
        rabbitTemplate.convertAndSend("user.exchange", "user.update", user);
    }
}

@Component
public class UserMQConsumer {
    @Autowired
    private UserESRepository userESRepository;

    @RabbitListener(queues = "user.queue")
    public void processUserAdd(User user) { userESRepository.save(user); }

    @RabbitListener(queues = "user.queue")
    public void processUserUpdate(User user) { userESRepository.save(user); }

    @RabbitListener(queues = "user.queue")
    public void processUserDelete(Long userId) { userESRepository.deleteById(userId); }
}

Pros

Complete decoupling; MySQL and ES operate independently.

High availability with MQ persistence and retry.

Scalable – easy to add more consumers.

Cons

Increased system complexity; need to maintain MQ cluster.

Potential message ordering issues.

Data consistency delayed, depends on consumer speed.

Suitable Scenarios

Large distributed systems that require high reliability and extensibility.

Comparison of the Five Solutions

Each method varies in real‑time capability, data consistency, system complexity, performance impact, and ideal use cases. Dual‑write offers the highest real‑time but hardest consistency; scheduled tasks are simple with eventual consistency; binlog and Canal provide high real‑time with moderate to high complexity; MQ adds reliability at the cost of added infrastructure.

Selection Guidance

For small or startup projects, choose dual‑write or scheduled tasks for simplicity.

For medium to large systems, prefer Canal or MQ asynchronous to ensure reliability and scalability.

When handling massive data with strict real‑time needs, binlog or Canal are optimal.

If existing MQ infrastructure is available, the MQ asynchronous approach maximizes resource reuse.

Key Considerations

Idempotency : Ensure sync operations are idempotent to avoid duplicate data.

Monitoring & Alerts : Build monitoring to detect sync delays or failures.

Data Validation : Periodically verify consistency between MySQL and ES.

Fault Tolerance : Design recovery mechanisms to prevent data loss.

Conclusion

Synchronizing MySQL to Elasticsearch is a common requirement in modern applications. Selecting the appropriate solution—whether dual‑write, scheduled tasks, binlog, Canal, or MQ asynchronous—depends on data volume, real‑time needs, system complexity, and existing infrastructure. Understanding each approach’s principles and trade‑offs enables informed technical decisions.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend DevelopmentElasticsearchmysqldata synchronizationdatabase integration
Su San Talks Tech
Written by

Su San Talks Tech

Su San, former staff at several leading tech companies, is a top creator on Juejin and a premium creator on CSDN, and runs the free coding practice site www.susan.net.cn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.