Big Data 7 min read

How Alibaba Tackles Real-Time Stream and Graph Computing at Scale

In his ASPLOS keynote, Alibaba’s Vice President Zhou Jingren detailed the company’s large‑scale stream and graph computing platforms, highlighting fault‑tolerance innovations, real‑time data challenges, and upcoming advances in graph analytics and massive machine‑learning workloads.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba Tackles Real-Time Stream and Graph Computing at Scale

Recently, the top architecture conference ASPLOS was held in China for the first time. Alibaba Cloud’s Vice President and Chief Scientist Zhou Jingren delivered a keynote, introducing Alibaba’s cloud big data and AI computing platform, its extensive products and services, and announcing future focus on graph computing and large‑scale machine learning.

Zhou Jingren speaking at ASPLOS
Zhou Jingren speaking at ASPLOS

With the proliferation of IoT sensors, mobile apps, and online services, massive data streams are continuously generated, making real‑time analytics increasingly critical for timely business decisions and dynamic service optimization.

Supporting such workloads requires massive, 24/7 stream‑processing clusters that deliver high throughput and low latency while handling hardware failures, network anomalies, and fluctuating input rates.

Alibaba’s big data platform handled nearly 100 million log events per second during the 2016 Double‑11 shopping festival, processing 100 PB of data within six hours, demonstrating extensive experience in large‑scale stream processing.

Alibaba’s Stream Computing Breakthrough

Zhou illustrated key fault‑tolerance techniques in Alibaba’s system design. When a compute node fails, the continuity of the data stream and upstream/downstream state are affected, making automatic recovery a critical challenge compared to offline batch processing.

Existing stream systems often rely on a single fault‑tolerance strategy such as replay, global snapshots, or mini‑batches. Real‑world large‑scale applications, however, combine multiple components with differing throughput and latency requirements, demanding a mix of strategies.

Alibaba introduced a virtual‑pipeline abstraction that decouples fault‑tolerance design from correctness analysis and system implementation, reducing complexity and allowing flexible composition of multiple strategies to handle diverse failure scenarios.

Three Challenges of Graph Computing

Graph computing is a key focus at Alibaba, enabling modeling of e‑commerce platforms, user products, and Alipay accounts as nodes for rich analytical scenarios such as search recommendation, anti‑fraud, and knowledge graphs.

Industrial and academic challenges arise from billions of nodes and edges that update rapidly, requiring real‑time concurrent updates and complex graph analysis. The three main challenges are:

Challenge 1: Graph visualization – effectively presenting graph features and information for human interaction, reasoning, and decision‑making.

Challenge 2: Pattern matching – defining and identifying core patterns in massive graphs for use cases like fraud detection, risk control, and ID mapping.

Challenge 3: Handling rapidly changing graphs – computing over graphs whose nodes and edges are dynamically updated.

Integrating graph computing with machine learning to leverage online user behavior for improved recommendation and search is another active research direction.

Zhou emphasized that Alibaba’s advantage in large‑scale machine learning stems from efficient utilization of billions of data samples and features, supported by heterogeneous CPU, GPU, and FPGA platforms that optimize training and inference for diverse business needs.

Alibaba is collaborating with leading universities to build joint platforms for graph computing and massive machine learning, aligning with the company’s “NASA” strategic plan.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

AlibabaBig Datastream computingAIfault tolerance
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.