Big Data 12 min read

Order Data Synchronization Architecture at YouZan: From MySQL to ES and HBase

YouZan’s order data synchronization moves changes from MySQL through Canal‑parsed binlogs into a message queue, then uses sequential SeqNo‑based optimistic locking and HBase’s column‑version timestamps to guarantee ordering for both single‑ and multi‑table updates, while a Logstash‑style configurable pipeline feeds ES for search and HBase for detail queries, eliminating ordered‑queue bottlenecks and ensuring high‑throughput consistency.

Youzan Coder
Youzan Coder
Youzan Coder
Order Data Synchronization Architecture at YouZan: From MySQL to ES and HBase

This article discusses the data synchronization architecture at YouZan, a SaaS e-commerce platform serving millions of merchants. As order search and detail requirements grew, YouZan adopted an ES+HBase architecture to handle search and detail queries.

1. Single Table Synchronization

The synchronization uses Canal to parse MySQL binlog and write changes to MQ, then a sync system processes the data. For single table scenarios, the key challenge is message ordering. By assigning a sequential SeqNo to each SQL execution result through ordered binlog parsing, and using optimistic locking in NoSQL, the ordering problem can be solved.

For HBase sync, the built-in timestamp controls each qualifier's version. By passing the sequential SeqNo as timestamp, field-level eventual consistency is guaranteed. For ES sync, external version numbers can be used with SeqNo as optimistic lock.

2. Multi-Table Synchronization

Multi-table sync introduces additional complexity. When two tables generate binlogs with different SeqNos, network issues can cause out-of-order consumption. HBase handles this automatically with its column version numbers. For ES, the solution involves ensuring ordered consumption through MQ - processing one message at a time with acknowledgment to guarantee sequential execution.

3. Configuration-Based Synchronization

Inspired by Logstash, the sync pipeline is abstracted into input, filter, and output components. Users can configure synchronization tasks through a UI without writing code, using Groovy for complex business logic.

4. Performance and Solutions

Ordered queues create bottlenecks: performance is limited by partition count, and any message failure causes堆积. YouZan's solution uses HBase as an intermediate layer - HBase's field-level versioning ensures internal field ordering, while the version field helps ES obtain correct external version numbers. This removes the need for ordered queues while maintaining high throughput.

5. Data Consistency Assurance

For data verification, the system leverages the堆积 characteristic of ordered queues - by delaying the first message by 10 minutes, subsequent messages also delay, allowing data comparison with the latest state.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsElasticsearchBinlogHBaseCanaldata synchronizationNoSQLOrder Management
Youzan Coder
Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.