Order Data Synchronization Architecture at YouZan: From MySQL to ES and HBase
YouZan’s order data synchronization moves changes from MySQL through Canal‑parsed binlogs into a message queue, then uses sequential SeqNo‑based optimistic locking and HBase’s column‑version timestamps to guarantee ordering for both single‑ and multi‑table updates, while a Logstash‑style configurable pipeline feeds ES for search and HBase for detail queries, eliminating ordered‑queue bottlenecks and ensuring high‑throughput consistency.
This article discusses the data synchronization architecture at YouZan, a SaaS e-commerce platform serving millions of merchants. As order search and detail requirements grew, YouZan adopted an ES+HBase architecture to handle search and detail queries.
1. Single Table Synchronization
The synchronization uses Canal to parse MySQL binlog and write changes to MQ, then a sync system processes the data. For single table scenarios, the key challenge is message ordering. By assigning a sequential SeqNo to each SQL execution result through ordered binlog parsing, and using optimistic locking in NoSQL, the ordering problem can be solved.
For HBase sync, the built-in timestamp controls each qualifier's version. By passing the sequential SeqNo as timestamp, field-level eventual consistency is guaranteed. For ES sync, external version numbers can be used with SeqNo as optimistic lock.
2. Multi-Table Synchronization
Multi-table sync introduces additional complexity. When two tables generate binlogs with different SeqNos, network issues can cause out-of-order consumption. HBase handles this automatically with its column version numbers. For ES, the solution involves ensuring ordered consumption through MQ - processing one message at a time with acknowledgment to guarantee sequential execution.
3. Configuration-Based Synchronization
Inspired by Logstash, the sync pipeline is abstracted into input, filter, and output components. Users can configure synchronization tasks through a UI without writing code, using Groovy for complex business logic.
4. Performance and Solutions
Ordered queues create bottlenecks: performance is limited by partition count, and any message failure causes堆积. YouZan's solution uses HBase as an intermediate layer - HBase's field-level versioning ensures internal field ordering, while the version field helps ES obtain correct external version numbers. This removes the need for ordered queues while maintaining high throughput.
5. Data Consistency Assurance
For data verification, the system leverages the堆积 characteristic of ordered queues - by delaying the first message by 10 minutes, subsequent messages also delay, allowing data comparison with the latest state.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Youzan Coder
Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
