Preparing JD's CDP Platform for Double 11: Challenges, Capacity Planning, and Lessons Learned
This article recounts the author's experience preparing JD's Customer Data Platform (CDP) for the Double 11 shopping festival, detailing the platform's capabilities, business scenarios, capacity planning, stability and performance challenges, disaster‑recovery measures, and personal reflections on the intensive technical effort involved.
The author, a recent JD graduate, shares a personal account of leading the Double 11 preparation for the CDP platform, a core user‑centred system that supports data fusion, tagging, crowd analysis, and real‑time marketing across multiple business units.
Key platform capabilities include supporting over 100 billion daily calls, serving payment, finance, and wealth services, and providing a unified data service for precise marketing.
Business scenarios span gold‑link services, marketing recommendation engines, decision support for finance and payment, and real‑time decisions for wealth products.
To organize the preparation, the team followed a structured roadmap: system inventory, capacity planning, disaster‑recovery registration, degradation plans, stress testing, and real‑time monitoring.
Important nodes were identified from the kickoff meeting onward, with a focus on ensuring system stability and performance under peak traffic of up to 980 k TPS.
Challenges centered on maintaining stability (quick disaster recovery, traffic throttling) and performance (meeting TP999 < 50 ms under near‑million TPS load).
Capacity planning involved single‑machine stress tests to determine maximum throughput and aggregating upstream traffic estimates to calculate required resource expansion.
Disaster‑recovery plans included one‑click failover for critical nodes and clear operation manuals for rapid response.
Degradation strategies temporarily paused non‑critical processing (e.g., crowd tagging, non‑essential MQ jobs) to free resources for core services during spikes.
Cluster‑wide stress tests were conducted with coordinated degradation drills to verify switch‑over mechanisms.
Real‑time monitoring was enhanced by re‑configuring alerts (phone, internal apps) and assigning dedicated on‑call engineers for each critical service.
During the event, a critical incident on November 4th triggered an immediate failover to a backup link, restoring service within two minutes and demonstrating the effectiveness of the prepared degradation and disaster‑recovery procedures.
The author reflects on the intense two‑month preparation, the technical growth gained, and the sense of achievement when the system sustained high concurrency without major issues.
Finally, the article includes a promotional section inviting readers to follow the JD Tech public account, share the post, and collect likes to win various prizes, with a deadline of November 21 10:00.
JD Tech
Official JD technology sharing platform. All the cutting‑edge JD tech, innovative insights, and open‑source solutions you’re looking for, all in one place.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.