Implementing Data Heterogeneity for JD Daojia Order Fulfillment: Architecture, Canal Integration, and Lessons Learned
This article examines JD Daojia's order fulfillment system, detailing the challenges of high‑volume prompt‑sound queries, the division of responsibilities among Redis, MySQL, and Elasticsearch, the adoption of Canal for asynchronous data replication, deployment practices with Kafka and Zookeeper, and the key operational lessons learned.
Prompt Sound Business Background
The order fulfillment system of JD Daojia involves multiple parties (users, merchants, logistics) and a series of steps from payment to delivery. Merchants need a prompt‑sound feature to alert them of new orders, but during peak periods the Elasticsearch queries cause CPU spikes and service degradation.
Underlying Data Source Responsibilities
Different storage components serve distinct roles:
Redis : stores and queries batch tasks using Zset for recent task retrieval; not used for complex queries.
MySQL : persists order data, separating hot (active orders) and cold (historical orders) databases, with master‑slave replication for read scaling.
Elasticsearch : handles the majority of query load, with three clusters (HOT, FULL, and a dedicated Remind cluster) to isolate prompt‑sound traffic.
Data Write Complexity Issue
Adding a fourth ES cluster for prompt‑sound increased write complexity, prompting the evaluation of heterogeneous middleware solutions. Criteria included community activity, availability, and product maturity, leading to the selection of Canal as the preferred tool.
Canal Overview and Practice
Canal captures MySQL binlog changes, filters events, and forwards them to a store, which then pushes data to downstream systems such as Kafka. The workflow consists of three steps: Load&Store (binlog extraction), Send&Ack (delivery to Kafka), and Update MetaInfo (synchronizing offsets in Zookeeper).
Canal High Availability
Deployer HA relies on Zookeeper temporary nodes and retry mechanisms.
MySQL HA requires GTID mode to ensure consistent binlog positions across master and slave.
Deployment Practice
Two Deployer instances provide HA for data transfer, while Kafka buffers binlog events before they are consumed by adapters and written to the Remind ES cluster. Order IDs are hashed to maintain ordering within Kafka partitions, and Zookeeper stores Canal metadata for persistence.
Practical Issues & Summary
Issue 1 – Kafka Unavailability: caused ES data gaps and delayed order fulfillment; resolved by restoring Kafka and replaying missing data from Zookeeper checkpoints.
Issue 2 – Deployer Failure: automatic failover to standby Deployer prevented service interruption, highlighting the need for multi‑machine, multi‑region deployment for true HA.
Key takeaways: monitoring and alerting are essential for distributed pipelines; a fallback write path is necessary when exhaustive error scenarios cannot be enumerated; and redundancy at both machine and service levels is critical for high availability.
Dada Group Technology
Sharing insights and experiences from Dada Group's R&D department on product refinement and technology advancement, connecting with fellow geeks to exchange ideas and grow together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.