Big Data 12 min read

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

This article explains how to build a large‑scale, real‑time vehicle monitoring system using Apache Storm and Kafka on Alibaba Cloud, covering the challenges of big‑data ingestion, system architecture, deployment steps, performance testing, and practical lessons learned.

Architect

Dec 30, 2015

Real-Time Big Data Processing with Storm and Kafka on Alibaba Cloud

In the era of big data, the rapid growth and unstructured nature of data make traditional tools insufficient for timely collection, management, and processing. The article introduces a solution that leverages Apache Storm and Apache Kafka to construct a real‑time message distribution and stream processing system, demonstrated through a vehicle status monitoring use case deployed on Alibaba Cloud.

Storm is described as an open‑source distributed real‑time computation system capable of processing millions of tuples per second with high scalability, fault tolerance, and guaranteed processing. Kafka is presented as a high‑throughput, low‑latency distributed messaging platform that supports publish/subscribe, persistence, and load balancing.

The proposed architecture separates Kafka brokers and Storm workers across multiple virtual machines to achieve horizontal scalability and high availability. Two Kafka broker servers, two Storm spout servers, two bolt servers, two Redis cache servers, and two web servers are allocated, with each role duplicated for failover.

Deployment on Alibaba Cloud is streamlined using cloud images and snapshots. After installing required software (Git, libzmq, Java, G++, Maven, Lein, etc.) on each server, custom images are created so that additional instances can be launched instantly, enabling elastic scaling.

Implementation details include a Kafka producer that simulates vehicle telemetry, a Storm topology (KafkaCarTopology) with a KafkaSpout, ParserCarDataBolt, and RedisCarBolt, and a Node.js front‑end that reads data from Redis via socket.io and visualizes it on a Bing map. Code snippets are provided for the producer, topology definition, and deployment scripts.

Performance testing with one topic, five partitions, three workers, and multiple client threads shows an average throughput of about 160,000 messages per second with ~30% CPU usage on a 2‑core, 2 GB ECS instance. The analysis identifies disk I/O on virtual machines as a bottleneck and recommends scaling out with more small instances rather than scaling up.

The conclusion highlights that Storm and Kafka together enable real‑time big‑data processing, and that cloud‑based image deployment dramatically reduces setup time, allowing rapid development cycles for high‑performance, high‑throughput streaming applications.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Streaming Kafka Alibaba Cloud Storm

Written by

Architect

Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.