Big Data 13 min read

How Alibaba’s Real‑Time Big Data Platform Powers Seamless Double‑11 Operations

This article explains how Alibaba built a real‑time big‑data operations platform—covering pre‑event preparation, full‑link diagnostics, automated load‑testing, and comprehensive monitoring—to ensure ultra‑low latency and high‑throughput during the massive Double‑11 shopping festival.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba’s Real‑Time Big Data Platform Powers Seamless Double‑11 Operations

Real‑Time Computing Business Promotion Strategy

Alibaba’s Double‑11 2018 achieved a total GMV of 213.5 billion yuan, with the Blink real‑time log system handling peaks of 1.7 billion events per second and keeping the first‑screen GMV display under three seconds, thanks to a meticulously prepared big‑data real‑time operations platform.

Three‑Phase Promotion Assurance Process

Pre‑event: set assurance goals, optimize resources (downgrade plans, self‑service job registration), and conduct comprehensive inspections.

Preparation: perform full‑link diagnosis, stress testing, monitoring dashboards, and automate risk‑response loops.

Event: screen monitoring, execute risk plans, and maintain on‑call duty.

Promotion Assurance Platform

The real‑time intelligent operations platform provides SRE, developers, and users with multi‑level services such as operation support, business assistance, tool integration, and promotion assurance. During Double‑11 it offered full‑link diagnosis, one‑click pressure testing, automated risk‑plan execution, and GMV monitoring dashboards.

Typical Real‑Time Data Flow

Data generation → data channel collection → first‑layer stream computation → intermediate results written back to channel → downstream stream computation → final result table → front‑end consumption.

Full‑Link Diagnosis

Given the massive scale of Blink jobs (thousands of containers across thousands of machines, dozens of metrics per subtask, and numerous system‑level indicators), the platform provides a one‑click diagnosis that isolates abnormal nodes and metrics, helping users quickly locate job failures, resource shortages, or failover causes.

Load‑Testing Challenges and Platform

Traditional load testing required manual cloning of shadow jobs, complex pressure generation, and iterative tuning. The platform automates shadow‑job cloning, pressure level selection, data injection, real‑time monitoring, and one‑click synchronization of successful configurations back to production, reducing preparation time from weeks to hours.

Real‑Time Monitoring Service

The platform aggregates metrics from Blink, Yarn, and underlying machines to build a multi‑dimensional monitoring dashboard. Job tags enable flexible grouping, filtering, and real‑time visualization, supporting custom dashboards for diverse business needs.

GMV Media Screen Assurance

GMV calculation requires sub‑second latency across a long chain of systems. The platform isolates GMV workloads into dedicated Yarn partitions, implements active‑passive disaster recovery, and provides a dedicated GMV monitoring dashboard to track end‑to‑end latency and hotspot machines.

Conclusion

Alibaba’s 2018 Double‑11 demonstrated that a well‑engineered real‑time big‑data operations platform—combining pre‑event preparation, automated diagnostics, scalable load testing, and comprehensive monitoring—can deliver silk‑smooth performance under extreme traffic, while leveraging AIOps techniques such as failover clustering, TPS anomaly detection, and self‑healing mechanisms.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

platform engineeringLoad Testing
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.