Big Data 16 min read

How Real‑Time Big Data Stream Computing Powers Double 11 E‑Commerce Success

The article explains how NetEase’s real‑time big‑data stream computing platform, Sloth, handles massive, continuously generated data during China’s Double 11 shopping festival, covering use cases, architectural shifts from batch to incremental processing, technical challenges, and the role of stream‑SQL for easier development.

ITPUB
ITPUB
ITPUB
How Real‑Time Big Data Stream Computing Powers Double 11 E‑Commerce Success

Background and Motivation

During the annual Double 11 shopping festival, e‑commerce platforms must process tens of millions of events per second and generate sales, transaction, and payment metrics in real time. The sheer volume of data makes batch‑oriented “store‑then‑compute” pipelines too slow, driving the need for a stream‑processing architecture that can ingest, compute, and output results with sub‑second latency.

Continuous Data Streams

In modern applications data is generated continuously: user clicks, GPS updates, sensor logs, media uploads, and IoT telemetry. Once captured, these events form a data stream that can be processed immediately to support use cases such as weather forecasting, personalized recommendations, and real‑time monitoring.

NetEase Real‑Time Use Cases

Live sales dashboard – During Double 11 and other sales events, NetEase’s “YouShu” big‑screen visualizes total sales, category ratios, order trends, and geographic distribution of active users. Each order updates the dashboard instantly, helping market operators make timely decisions.

Live sales dashboard
Live sales dashboard

Financial risk control – Stream processing applies risk‑scoring models to massive user‑behavior logs, detects anomalies, assigns risk levels, and automatically triggers alerts and workflow changes.

Real‑time recommendation – Click‑stream data continuously updates user interest profiles, enabling personalized news, music, or book recommendations that adapt as user behavior evolves.

From “Store‑Then‑Compute” to “Compute‑While‑Store”

Traditional batch systems first persist data and later run queries, which is infeasible for high‑velocity streams. NetEase adopts an incremental model: incoming events participate in computation immediately, updating aggregates without re‑processing the entire history. Only the incremental results are persisted, reducing storage pressure and latency.

Core Technical Challenges

Low latency & high throughput – The platform must ingest millions of events per second, distribute them across hundreds of nodes, and emit results with minimal delay, especially during peak moments such as midnight orders on Double 11.

Accuracy of incremental computation – New data must be merged correctly with existing aggregates, and in some cases prior contributions must be retracted to keep results exact.

Fault tolerance – Node failures must be detected and recovered quickly without breaking the incremental computation chain, preserving both real‑time guarantees and result correctness.

Sloth Stream Computing Platform

Sloth is NetEase’s self‑built, multi‑tenant stream processing platform. It abstracts low‑level distributed programming behind a SQL‑like language. Users write declarative queries; Sloth translates them into a distributed execution plan that handles data partitioning, parallelism, state management, and checkpointing automatically.

Key capabilities:

SQL‑style semantics enable migration of existing batch jobs to real‑time streams without learning new APIs.

Incremental execution engine supports windowing, triggers, and stateful operators.

Pluggable connectors integrate with various storage back‑ends (relational databases, message queues, offline stores).

Multi‑tenant isolation and resource quotas ensure stable operation for many concurrent teams.

Implementation Highlights

Sloth’s runtime distributes operators across a cluster of worker nodes. Each operator maintains local state (e.g., counters, sketches) that is periodically checkpointed to durable storage, enabling exactly‑once processing semantics. Fault recovery replays only the uncommitted portion of the stream, avoiding full recomputation.

Windowed aggregations are expressed in the SQL layer, for example:

SELECT TUMBLE_START(event_time, INTERVAL '5' MINUTE) AS window_start,
       SUM(amount) AS total_sales
FROM orders
GROUP BY TUMBLE(event_time, INTERVAL '5' MINUTE);

This query produces a continuously updating table where each 5‑minute window is emitted as soon as its data is complete, satisfying both low‑latency and incremental correctness requirements.

Future Directions

Beyond e‑commerce dashboards, real‑time stream computing is expanding to image, audio, IoT, and online machine‑learning workloads. Emerging stream‑SQL engines are adding richer window definitions, trigger policies, and automatic query optimization (e.g., operator re‑ordering, state size reduction). These advances will broaden adoption across industries that require instant insight from ever‑growing data streams.

Future of stream computing
Future of stream computing
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed Systemse‑commercestream processingSQLReal‑Time Computing
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.