Big Data 14 min read

Understanding Storm: A Distributed Real-Time Computation System

The article explains the need for low‑latency, high‑performance, distributed real‑time processing, outlines the challenges such systems must address, and introduces Storm as a Hadoop‑like framework for stream processing, detailing its architecture, fault‑tolerance mechanisms, transactional topology, and large‑scale deployment at Taobao.

Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Art of Distributed System Architecture Design
Understanding Storm: A Distributed Real-Time Computation System

With the rapid growth of information, users now expect instant access to up‑to‑date data; batch‑oriented systems like Hadoop cannot meet this demand, leading to the need for a real‑time computation platform.

The goal is to build a real‑time computing system that satisfies five key requirements: low latency, high performance, distributed execution, scalability, and fault tolerance.

A simple approach is to combine a message queue with worker processes distributed across machines, while also ensuring easy application development, no message loss, and strict message ordering.

Storm is introduced as a solution: a distributed real‑time computation system analogous to Hadoop for batch processing, providing simple primitives for stream processing.

Typical Storm use cases include continuous stream data processing and distributed RPC services, where low latency is critical.

Basic concepts

Hadoop

Storm

System role

JobTracker

Nimbus

TaskTracker

Supervisor

Child

Worker

Application name

Job

Topology

Component interface

Mapper/Reducer

Spout/Bolt

Key Storm components:

Nimbus : responsible for resource allocation and task scheduling.

Supervisor : launches and stops Worker processes assigned by Nimbus.

Worker : runs the actual processing logic of Spouts and Bolts.

Task : a thread executing a Spout or Bolt; after Storm 0.8, multiple tasks may share a physical thread called an executor.

Topology : a real‑time application composed of interconnected Spouts and Bolts.

Spout : the source of data streams, actively emitting tuples via the nextTuple() method.

Bolt : processes incoming tuples, performing filtering, aggregation, database writes, etc., via the execute(Tuple input) method.

Tuple : the basic unit of message passing, essentially a list of values.

Stream : a continuous flow of tuples.

Storm supports various stream groupings (shuffle, fields, hash, all, global, none, direct, localOrShuffle) to control how tuples are partitioned among bolts.

Record‑level fault tolerance

Storm assigns a 64‑bit message ID (root ID) to each source tuple and tracks all downstream tuple IDs using an acker component. By XOR‑ing the IDs, the acker can determine when a message has been fully processed; if the XOR result returns to zero, the message is considered complete.

Fault tolerance diagram
Fault tolerance diagram

Although a rare collision of tuple IDs could cause a false‑positive completion, the probability is negligible in practice.

Transactional topology (Trident)

Storm 0.7 introduced transactional topologies to guarantee exactly‑once processing for strict use cases; this feature has been refined in Storm 0.8 as Trident, offering a more convenient API.

Storm at Taobao

Taobao uses Storm extensively for real‑time log processing, risk control, and recommendation, handling millions to billions of messages daily (TB‑scale data). The typical pipeline reads logs from Kafka‑like MetaQ or HBase‑based timetunnel, processes them with Storm, and writes results to distributed storage for downstream services.

Future outlook

Storm continues to evolve with features such as stateful processing, Trident, custom schedulers, and integration with projects like Mesos, Kryo, Disruptor, and Curator, aiming toward a robust 1.0 release.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

distributed systemsBig Datareal-time processingstream processingfault toleranceStorm
Art of Distributed System Architecture Design
Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.