Big Data 9 min read

Designing a Real-Time Stream Computing Platform for E‑commerce Peak Traffic at Yihaodian

The article describes how Yihaodian built a low‑latency, highly available, and easily scalable streaming computation platform using Storm, Kafka, Linux containers and a custom CGroup management framework to handle massive e‑commerce traffic spikes and real‑time analytics.

Architecture Digest
Architecture Digest
Architecture Digest
Designing a Real-Time Stream Computing Platform for E‑commerce Peak Traffic at Yihaodian

During major e‑commerce events such as JD 618, Yihaodian 711 and Double 11, traffic surges can cause minute‑level failures that lead to significant revenue loss, making low latency, high availability, and easy scalability essential for peak‑traffic systems.

While Hadoop excels at batch processing, many e‑commerce services (search, recommendation, monitoring, anti‑fraud) require real‑time processing at sub‑second latency, prompting Yihaodian to adopt the Storm stream‑processing framework.

Yihaodian Stream Computing Solution

Yihaodian selected Storm as the core of its distributed stream‑computing platform. The overall data‑flow is illustrated in Figure 1.

Figure 1 Yihaodian Distributed Stream Computing System

Yihaodian developed a custom data‑recording component called Tracker, which works with Flume to collect website logs efficiently and support horizontal scaling. Kafka is used as a front‑end message buffer to minimize data loss and satisfy various parallelism and ordering requirements. Storm‑processed results are either persisted or discarded according to business needs.

To further improve stability, Yihaodian leverages Linux container technology (cgroups) to isolate CPU, memory, block‑device I/O and network resources at the process level, preventing any single business process from monopolizing system resources and causing node slowdown or failure.

Because Storm does not natively support cgroup isolation, Yihaodian designed its own CGroup resource‑management framework, shown in Figure 2, to enforce per‑Topology resource limits.

Figure 2 Resource Management Framework on Yihaodian Storm Cluster

Simplified User Experience

Users only need to perform three operations: create/delete a CGroup (supported types: cpu, memory, cpuset), assign a process to a specific CGroup, and set Topology priority via the Storm UI (priority reflects the amount of resources the process group can obtain). These actions are executed through a custom client command ycgClient, while a daemon ycgManager automatically manages per‑node process‑level priorities.

Automation Reduces Manual Management Cost

Topology priority information is stored in a ZooKeeper cluster.

The resource‑management framework adapts to dynamic addition of heterogeneous machines.

Cluster configuration (logical CPU count, memory, swap size) is automatically stored in ZooKeeper, simplifying overall cluster management.

Worker/Executor‑Level Resource Limiting

This capability is not available in the standard Storm‑on‑YARN implementation.

Modular Design for Easy Deployment and Rollback

The framework is independent of the underlying compute engine, allowing straightforward installation, deployment, and rollback, and can be adapted to other platforms with minor modifications.

To address shortcomings of the original Storm UI (command‑line‑only submission, lack of operation logs, missing permission control), Yihaodian re‑implemented the UI in a non‑Clojure language, adding user management, permission settings, operation logging, and plans for finer‑grained real‑time monitoring.

Since early 2014, Yihaodian’s Storm platform processes over 200 million log entries per day, supporting real‑time traffic monitoring, personalized recommendation, BI, security‑log analysis, fraud detection, and order monitoring with sub‑second response times.

Conclusion

As a rapidly growing mid‑size e‑commerce company, Yihaodian recognizes increasing pressure on its real‑time computing platform. Ongoing efforts focus on performance and reliability improvements, continuous feedback collection, and further research and development to meet future peak‑traffic challenges.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

e‑commercestream processingKafkacontainerizationStormResource Isolation
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.