Big Data 9 min read

Mastering Storm Topology: Architecture, Concurrency, and Scaling Strategies

This guide explains Storm's architecture, the roles of spouts and bolts, the hierarchy of nodes, workers, executors, and tasks, and shows how to configure concurrency by adding workers, adjusting executor and task counts, and handling special cases for optimal stream processing performance.

Java Backend Technology

Sep 5, 2017

Mastering Storm Topology: Architecture, Concurrency, and Scaling Strategies

1. Storm Architecture Overview

In the previous article we built a Storm cluster and demonstrated Java code. In Storm, you first design a real‑time computation graph called a topology , which is submitted to the cluster. The cluster’s master node distributes the code and assigns tasks to worker nodes for execution.

A topology consists of two component types: spout and bolt. A spout emits data streams as tuples, while a bolt processes, filters, or transforms those tuples and can emit new tuples to other bolts. Each tuple is an immutable key‑value pair.

In Storm, a task is an instance of a spout or bolt running on a cluster node. The topology hierarchy from low to high is: task (spout/bolt instance), executor (thread), worker (JVM), node (server).

The components are defined as follows:

Nodes (servers) : Physical or virtual machines that host parts of the topology. A cluster may have one or many nodes.

Workers (JVMs) : Independent JVM processes running on a node. Each node can host multiple workers, and a topology can be distributed across several workers.

Executor (thread) : A Java thread inside a worker JVM. Multiple tasks can share an executor, but by default Storm assigns one task per executor.

Task (spout/bolt instance) : The actual spout or bolt object whose nextTuple() or execute() methods are invoked by executors.

2. Default Concurrency Mechanism

The example topology (Code A) contains a RandomNameSpout and two bolts: UpperBolt and AppendBolt. By default, Storm assigns a concurrency of 1 for each component unless explicitly configured.

Assuming a single node with one worker and one executor per task, the execution flow is shown in Figure A, where concurrency exists only at the thread level: each task runs in a separate thread within the same JVM.

3. Adding Workers to Increase Parallelism

Increasing the number of workers is the simplest way to boost a topology’s processing capacity. By modifying the configuration as shown below, the topology is allocated two workers instead of the default one.

Figure B illustrates the new layout with two workers.

4. Configuring Executors and Tasks

Storm’s concurrency API lets you set the number of executors per component and the number of tasks per executor. For example, to run RandomNameSpout with two tasks, each assigned its own executor thread, modify the topology as follows:

The API comment for setSpout() explains that the third parameter sets the spout’s concurrency to two tasks, each with its own executor thread.

Figure C shows the resulting topology with two executors handling the two tasks.

5. Further Scaling: Multiple Executors and Tasks

By increasing both the number of spout threads and the number of executors, you can achieve a richer parallelism configuration. The complete code for this setup is shown below.

Figure D visualizes the topology after adding two workers, each sharing the tasks evenly.

Note that simply adding workers on a single node (local mode) does not significantly improve performance because all workers run in the same JVM, leading to resource contention. Effective scaling requires adjusting both task and executor counts.

6. Special Cases

In some scenarios you may need a bolt that aggregates results in a single‑threaded manner. Such a bolt must be configured with one task and one executor; parallelizing it would produce incorrect results. The following diagram (Figure E) illustrates this limitation.

Code example for the single‑threaded bolt:

Storm provides straightforward APIs for concurrency control, but you must choose executor and task numbers that match your business logic to avoid erroneous outcomes.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java concurrency Storm topology

Written by

Java Backend Technology

Focus on Java-related technologies: SSM, Spring ecosystem, microservices, MySQL, MyCat, clustering, distributed systems, middleware, Linux, networking, multithreading. Occasionally cover DevOps tools like Jenkins, Nexus, Docker, and ELK. Also share technical insights from time to time, committed to Java full-stack development!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.