How Alibaba’s Real‑Time Big Data Platform Powers Seamless Double‑11 Operations
This article explains how Alibaba built a real‑time big‑data operations platform—covering pre‑event preparation, full‑link diagnostics, automated load‑testing, and comprehensive monitoring—to ensure ultra‑low latency and high‑throughput during the massive Double‑11 shopping festival.
Real‑Time Computing Business Promotion Strategy
Alibaba’s Double‑11 2018 achieved a total GMV of 213.5 billion yuan, with the Blink real‑time log system handling peaks of 1.7 billion events per second and keeping the first‑screen GMV display under three seconds, thanks to a meticulously prepared big‑data real‑time operations platform.
Three‑Phase Promotion Assurance Process
Pre‑event: set assurance goals, optimize resources (downgrade plans, self‑service job registration), and conduct comprehensive inspections.
Preparation: perform full‑link diagnosis, stress testing, monitoring dashboards, and automate risk‑response loops.
Event: screen monitoring, execute risk plans, and maintain on‑call duty.
Promotion Assurance Platform
The real‑time intelligent operations platform provides SRE, developers, and users with multi‑level services such as operation support, business assistance, tool integration, and promotion assurance. During Double‑11 it offered full‑link diagnosis, one‑click pressure testing, automated risk‑plan execution, and GMV monitoring dashboards.
Typical Real‑Time Data Flow
Data generation → data channel collection → first‑layer stream computation → intermediate results written back to channel → downstream stream computation → final result table → front‑end consumption.
Full‑Link Diagnosis
Given the massive scale of Blink jobs (thousands of containers across thousands of machines, dozens of metrics per subtask, and numerous system‑level indicators), the platform provides a one‑click diagnosis that isolates abnormal nodes and metrics, helping users quickly locate job failures, resource shortages, or failover causes.
Load‑Testing Challenges and Platform
Traditional load testing required manual cloning of shadow jobs, complex pressure generation, and iterative tuning. The platform automates shadow‑job cloning, pressure level selection, data injection, real‑time monitoring, and one‑click synchronization of successful configurations back to production, reducing preparation time from weeks to hours.
Real‑Time Monitoring Service
The platform aggregates metrics from Blink, Yarn, and underlying machines to build a multi‑dimensional monitoring dashboard. Job tags enable flexible grouping, filtering, and real‑time visualization, supporting custom dashboards for diverse business needs.
GMV Media Screen Assurance
GMV calculation requires sub‑second latency across a long chain of systems. The platform isolates GMV workloads into dedicated Yarn partitions, implements active‑passive disaster recovery, and provides a dedicated GMV monitoring dashboard to track end‑to‑end latency and hotspot machines.
Conclusion
Alibaba’s 2018 Double‑11 demonstrated that a well‑engineered real‑time big‑data operations platform—combining pre‑event preparation, automated diagnostics, scalable load testing, and comprehensive monitoring—can deliver silk‑smooth performance under extreme traffic, while leveraging AIOps techniques such as failover clustering, TPS anomaly detection, and self‑healing mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
