How Real‑Time Big Data Stream Computing Powers Double 11 E‑Commerce Success
The article explains how NetEase’s real‑time big‑data stream computing platform, Sloth, handles massive, continuously generated data during China’s Double 11 shopping festival, covering use cases, architectural shifts from batch to incremental processing, technical challenges, and the role of stream‑SQL for easier development.
Background and Motivation
During the annual Double 11 shopping festival, e‑commerce platforms must process tens of millions of events per second and generate sales, transaction, and payment metrics in real time. The sheer volume of data makes batch‑oriented “store‑then‑compute” pipelines too slow, driving the need for a stream‑processing architecture that can ingest, compute, and output results with sub‑second latency.
Continuous Data Streams
In modern applications data is generated continuously: user clicks, GPS updates, sensor logs, media uploads, and IoT telemetry. Once captured, these events form a data stream that can be processed immediately to support use cases such as weather forecasting, personalized recommendations, and real‑time monitoring.
NetEase Real‑Time Use Cases
Live sales dashboard – During Double 11 and other sales events, NetEase’s “YouShu” big‑screen visualizes total sales, category ratios, order trends, and geographic distribution of active users. Each order updates the dashboard instantly, helping market operators make timely decisions.
Financial risk control – Stream processing applies risk‑scoring models to massive user‑behavior logs, detects anomalies, assigns risk levels, and automatically triggers alerts and workflow changes.
Real‑time recommendation – Click‑stream data continuously updates user interest profiles, enabling personalized news, music, or book recommendations that adapt as user behavior evolves.
From “Store‑Then‑Compute” to “Compute‑While‑Store”
Traditional batch systems first persist data and later run queries, which is infeasible for high‑velocity streams. NetEase adopts an incremental model: incoming events participate in computation immediately, updating aggregates without re‑processing the entire history. Only the incremental results are persisted, reducing storage pressure and latency.
Core Technical Challenges
Low latency & high throughput – The platform must ingest millions of events per second, distribute them across hundreds of nodes, and emit results with minimal delay, especially during peak moments such as midnight orders on Double 11.
Accuracy of incremental computation – New data must be merged correctly with existing aggregates, and in some cases prior contributions must be retracted to keep results exact.
Fault tolerance – Node failures must be detected and recovered quickly without breaking the incremental computation chain, preserving both real‑time guarantees and result correctness.
Sloth Stream Computing Platform
Sloth is NetEase’s self‑built, multi‑tenant stream processing platform. It abstracts low‑level distributed programming behind a SQL‑like language. Users write declarative queries; Sloth translates them into a distributed execution plan that handles data partitioning, parallelism, state management, and checkpointing automatically.
Key capabilities:
SQL‑style semantics enable migration of existing batch jobs to real‑time streams without learning new APIs.
Incremental execution engine supports windowing, triggers, and stateful operators.
Pluggable connectors integrate with various storage back‑ends (relational databases, message queues, offline stores).
Multi‑tenant isolation and resource quotas ensure stable operation for many concurrent teams.
Implementation Highlights
Sloth’s runtime distributes operators across a cluster of worker nodes. Each operator maintains local state (e.g., counters, sketches) that is periodically checkpointed to durable storage, enabling exactly‑once processing semantics. Fault recovery replays only the uncommitted portion of the stream, avoiding full recomputation.
Windowed aggregations are expressed in the SQL layer, for example:
SELECT TUMBLE_START(event_time, INTERVAL '5' MINUTE) AS window_start,
SUM(amount) AS total_sales
FROM orders
GROUP BY TUMBLE(event_time, INTERVAL '5' MINUTE);This query produces a continuously updating table where each 5‑minute window is emitted as soon as its data is complete, satisfying both low‑latency and incremental correctness requirements.
Future Directions
Beyond e‑commerce dashboards, real‑time stream computing is expanding to image, audio, IoT, and online machine‑learning workloads. Emerging stream‑SQL engines are adding richer window definitions, trigger policies, and automatic query optimization (e.g., operator re‑ordering, state size reduction). These advances will broaden adoption across industries that require instant insight from ever‑growing data streams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
