Big Data 20 min read

How Alibaba’s Real‑Time Big Data Engine Powered a Record‑Breaking Double 11

This article explains how Alibaba built a massive real‑time computing platform using Flink and its Blink extensions, detailing the challenges of ultra‑low latency, exactly‑once guarantees, and high throughput, and showing how these technologies powered the record‑breaking Double 11 shopping festival.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s Real‑Time Big Data Engine Powered a Record‑Breaking Double 11

Real‑time Computing at Alibaba

Alibaba has grown from an e‑commerce platform to a massive ecosystem with billions of users and exabytes of data, generating petabytes daily. Real‑time computation processes up to 100 million events per second, reaching 470 million per second during Double 11.

Key Application Scenarios

1. Double 11 GMV Dashboard – a real‑time screen aggregates transaction data with sub‑second latency, requiring per‑second processing, exactly‑once guarantees, and high availability.

2. Real‑time Machine Learning – features and models are updated continuously to adapt to rapidly changing data, especially during promotional events.

3. Real‑time A/B Testing – multiple model variants generate metrics that are aggregated in real time, demanding efficient resource usage.

Why Flink?

After evaluating many frameworks, Alibaba chose Apache Flink for its stateful processing, support for the Chandy‑Lamport algorithm (enabling exactly‑once semantics) and low‑latency high‑throughput capabilities.

Blink Runtime Optimizations

Blink extends Flink with four major improvements:

Per‑job master to avoid a single JobMaster bottleneck.

Isolated TaskManagers per job for better fault isolation.

ResourceManager that dynamically adjusts resources.

Support for YARN, Mesos and standalone deployments.

Incremental Checkpoint

Instead of full state snapshots, Blink stores only state changes at each checkpoint, reducing checkpoint time to seconds and minimizing fail‑over latency.

Asynchronous I/O

Async‑IO allows multiple concurrent reads from external storage, dramatically increasing CPU utilization and throughput compared with synchronous I/O.

Flink SQL Core Features

Flink SQL provides DML, DDL, UDF/UDTF/UDAF, joins (including lookup and snapshot joins), retraction handling, window aggregations (tumble, sliding, session), and extensive query optimizations such as micro‑batching, push‑down, and Top‑N optimizations.

Platforms Built on the Stack

1. Alibaba Cloud Stream Compute – an end‑to‑end platform where users write SQL, debug, and deploy jobs on Alibaba Cloud clusters. It powered most real‑time jobs during Double 11.

2. Porsche Real‑time Machine Learning Platform – a visual IDE that translates DAGs into SQL, runs on Blink, and integrates with HBase and TensorFlow for algorithmic development.

Double 11 Impact

The real‑time stack enabled Alibaba to handle tens of millions of concurrent transactions, display 100 billion records in 3 minutes, and improve recommendation accuracy, contributing to a GMV of 168.2 billion USD.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Flinkstream processingSQLblinkReal‑Time Computing
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.