Backend Development 9 min read

How Youzan Scaled Order Management: Sharding, Elasticsearch, and HBase

The article details Youzan's three‑stage evolution of its order management system—from database sharding to Elasticsearch‑based cross‑table search and finally HBase for fast detail assembly—while addressing data sync, real‑time consistency, and idempotency strategies.

Youzan Coder

Dec 7, 2018

How Youzan Scaled Order Management: Sharding, Elasticsearch, and HBase

First Stage: Sharding (Database Partitioning)

As order volume grew, a single database table could no longer handle the load. The solution was to split data by buyer ID and shop ID, creating multiple databases and tables. This partitioning dramatically increased the system's capacity for handling business traffic.

Second Stage: Introducing Elasticsearch for Cross‑Table Search

Increasing search dimensions caused frequent slow queries across tables, prompting the adoption of Elasticsearch. Order main tables and auxiliary tables were indexed into a unified ES index, enabling fast cross‑table searches.

Key considerations:

Single‑type vs. multi‑type indices : Multi‑type mirrors relational tables, simplifying field‑to‑column mapping but incurs extra aggregation overhead for multi‑table queries. Single‑type offers one‑request data retrieval at the cost of higher sync complexity. The team chose a single‑type approach.

Index field count control : Only fields with strong search requirements were indexed to avoid oversized index files and maintain query performance.

Third Stage: Adding HBase for Efficient Detail Assembly

Although search speed improved, assembling full order details remained slow because order IDs retrieved from ES required additional lookups in many extension tables. To solve this, HBase was introduced as a column‑oriented, scalable storage layer.

All core order information and essential extensions are bulk‑loaded into HBase (historical data) and incrementally synced via messaging (new data). Using the order ID as the row key allows a single request to fetch both basic and extended details.

The final architecture became a three‑layer stack: DB for writes, Elasticsearch for search parsing and basic returns, and HBase for detailed data assembly, also serving export and analytics needs.

Data Synchronization and Consistency

Real‑time, consistent data sync between DB, ES, and HBase is critical. The pipeline listens to binlog events, pushes them to a message queue, and then processes them to update ES and HBase. Monitoring timestamps at each stage (binlog → MQ → processing → ES/HBase) provides a metric for latency.

Idempotency safeguards include:

Optimistic lock fields (e.g., order state progression).

Version columns incremented on each update.

Custom Snowflake‑style IDs combining timestamps and binlog offsets.

Ordered message consumption using modulo hashing on a business key (order ID) to ensure the same order’s events are processed by the same consumer.

These measures ensure reliable, ordered updates and have been validated in production.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

backend-architecture Sharding Data synchronization Order Management

Written by

Youzan Coder

Official Youzan tech channel, delivering technical insights and occasional daily updates from the Youzan tech team.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.