How JD.com’s Order Fulfillment Scales: Data Heterogeneity & Complexity‑Driven Architecture

This talk explains JD.com’s order‑fulfillment workflow, the underlying storage stack of Redis, MySQL and Elasticsearch, the challenges of high‑traffic alert‑sound queries, the adoption of Canal for data heterogeneity, and a 4R‑based complexity‑oriented design methodology for building resilient backend systems.

dbaplus Community
dbaplus Community
dbaplus Community
How JD.com’s Order Fulfillment Scales: Data Heterogeneity & Complexity‑Driven Architecture

1. Order Fulfillment Business Background

JD.com’s delivery platform processes orders through payment, merchant dispatch, confirmation, printing, picking, logistics, delivery and final receipt. During peak sales, merchant interfaces become bottlenecks, requiring an alert‑sound feature that continuously queries the underlying storage to notify merchants of pending orders.

2. Underlying Storage Responsibilities

The system uses a layered storage design:

Redis : Stores batch‑task metadata for workers. It leverages ZSET to query recent tasks by timestamp, but does not handle complex queries because its query capabilities are limited compared to Elasticsearch.

MySQL : Persists order data with strong transactional guarantees, serving as the source of truth for both hot (active) and cold (historical) orders.

Elasticsearch : Handles the majority of query traffic, supporting complex filters and distributed scaling. Three ES clusters are deployed: a hot cluster for real‑time queries, a full cluster for less‑frequent access, and a dedicated “Remind” cluster for the alert‑sound use case, isolating its heavy query load.

Hot and cold data are separated at the database level (business DB vs. history DB) to keep single‑table sizes in the tens of millions and avoid excessive sharding.

3. Write‑Complexity Challenge

Introducing a separate alert‑sound ES cluster creates a write path that must update three data stores (Redis, MySQL, ES) for each order, increasing development and operational complexity. To mitigate this, a data‑heterogeneity middleware (Canal) is used to synchronize writes efficiently.

4. Canal Overview

Canal captures MySQL binlog changes via Zookeeper‑discovered positions, parses them, filters unwanted events, and stores the result in a Store. The data is then sent by workers to downstream systems such as Kafka or directly to Elasticsearch. The workflow consists of three steps:

Load & Store : Deployer reads binlog positions, parses events, and writes filtered messages to the Store.

Send & Ack : Workers pull from the Store and push to the target (e.g., Kafka), acknowledging successful delivery.

Update MetaInfo : Canal asynchronously updates the stored binlog filename and position to ensure continuity.

Canal requires idempotent, retry‑able, and delayed processing semantics.

5. Alert‑Sound Implementation with Canal

Two Deployer instances provide high availability for data transfer. Orders are streamed to Kafka, buffered, and then consumed by an adapter cluster that writes to the dedicated alert‑sound ES cluster. Ordering is preserved using an order‑ID hash to guarantee partition‑level ordering.

Kafka acts as a buffer to smooth bursty write traffic; direct writes would risk overwhelming the target ES cluster during large promotions.

6. Operational Issues & High Availability

Issue 1 – Kafka Unavailability : A network failure caused Kafka to stop, leading to an 8‑hour delay in alert‑sound data and a temporary switch back to the hot ES cluster. After restoring Kafka, missing data was replayed by resetting Zookeeper journal positions.

Issue 2 – Deployer Failure : One Deployer node failed, but HA logic in Zookeeper automatically promoted the standby, allowing continuous consumption without data loss.

Canal provides two HA mechanisms: Deployer HA via Zookeeper temporary nodes and retry logic, and MySQL HA via a dedicated detection thread that works only in GTID mode.

7. Complexity‑Driven Architecture Methodology (4R Model)

The presenter proposes a 4R view—Rank, Role, Relation, Rule—to describe system components (Parser, Sink, Store) and guide design. Complexity is split into technical and business dimensions; open‑source tools (Dubbo, Spring, Canal) address technical complexity, while business‑driven complexity remains.

The design process follows eight steps:

Requirement gathering.

Judgment of data volume, peak load, HA needs.

Complexity identification (business vs. data‑volume).

Architectural alternatives (e.g., ES hot/cold split, data heterogeneity).

Trade‑off analysis based on simplicity, suitability, evolution.

Apply the 4R view to draft a layered architecture.

Implement the design.

Iterate based on feedback.

Examples of the 4R‑based design are shown, with some details omitted for confidentiality.

8. Q&A Highlights

Mapping fields from order tables to ES is configured in the Adapter.

Canal encountered a “Column not match” exception, resolved via its TSDB.

Complexity methodology is applied by decomposing requirements with the 4R lens.

Canal may exhibit latency due to binlog volume and network conditions.

Kafka‑based pipelines introduce buffering but can cause data loss if the broker is down.

New alert‑sound ES cluster isolates query load, avoiding the need to scale the hot cluster.

Typical query peak for the new cluster is 2,000‑4,000 QPS.

Redundancy scales from machine‑level to multi‑region, increasing HA at the cost of complexity and expense.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend ArchitectureElasticsearchhigh availabilityorder fulfillmentCanaldata heterogeneity
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.