Big Data 23 min read

Heron vs. Storm: Architecture, Performance Evaluation, and Design Lessons

The article provides a comprehensive overview of Twitter's Heron stream processing system, comparing its architecture, design principles, back‑pressure mechanisms, resource utilization, and performance test results with Storm/JStorm, and concludes with practical insights for large‑scale deployments.

Architecture Digest
Architecture Digest
Architecture Digest
Heron vs. Storm: Architecture, Performance Evaluation, and Design Lessons

On June 1, 2015 Twitter announced its Heron system, claiming it surpasses Storm in many aspects. The author read the paper, took personal notes, and shares a summary of Heron's design, performance, and suitability for large clusters.

Final Summary: Heron is best suited for clusters with more than 1,000 machines, offering superior stability but only comparable or slightly weaker performance; it can share resources with other frameworks but may waste resources at the topology level.

Current Situation: Existing production topologies run on Heron, processing tens of terabytes and billions of messages daily.

Why Redesign Heron:

Debug‑ability is poor because multiple tasks share a single process.

A more advanced resource pool management system is needed, similar to YARN/Mesos or Twitter's Aurora.

Storm's design has several shortcomings, such as executor threading, task isolation, logging, and lack of back‑pressure.

Heron Design Principles:

Compatibility with the old Storm API.

Support for at‑most‑once and at‑least‑once delivery guarantees.

Architecture Overview:

Heron uses Aurora as a generic service scheduler to implement a Topology Scheduler that can be ported to YARN, Mesos, or EC2. The first container runs the Topology Manager (TM); other containers host a Stream Manager (SM), Metrics Manager (MM), and multiple Heron Instances (HI). A container is analogous to a Docker unit and can host multiple containers on a single machine, with cgroup isolation.

Topology Manager: Acts as the unique contact for the topology lifecycle, similar to YARN's Application Master, and can operate in active‑standby mode using ZooKeeper.

Stream Manager (SM): Serves as a hub for tuple traffic; all Heron Instances connect to the SM for send/receive. Local fast channels are used when HI communication stays within the same container.

Back‑pressure Mechanism: When downstream processing slows, back‑pressure signals upstream components to reduce speed, preventing buffer overflow and data loss. Heron implements spout‑level back‑pressure for quick response and precise identification of slow HIs.

Heron Instance (HI):

One task per process.

Inter‑process communication uses Protocol Buffers.

Each HI has a gateway thread for external communication and an execution thread for processing.

Data‑in/data‑out queues have dynamic sizing to avoid GC spikes.

Metrics Manager (MM): Collects system and user metrics from SMs and forwards them to monitoring systems and the TM.

Workflow:

Job submission – Aurora allocates resources and schedules containers.

TM starts in a container and registers with ZooKeeper.

SMs discover TM via ZooKeeper and send heartbeats.

TM creates a physical execution plan and stores it in ZooKeeper.

SMs download the plan, start HIs, and the topology begins data flow.

Failure Cases:

If TM fails, it is restarted and re‑downloads the plan; standby TM can be promoted.

If an SM fails, a new SM is started and re‑joins the topology.

If an HI fails, it is restarted and resumes execution.

External Systems: Heron Tracker watches ZooKeeper for new TMs, gathers topology metadata, provides a REST API, and balances load across instances.

Performance Testing: The author conducted three rounds of tests on a 10‑node cluster (8 cores/128 GB each) using FastWordCount and WordCount topologies. Initial runs showed severe back‑pressure; after patches and tuning container task counts, throughput reached ~200 k QPS, still below expectations.

Analysis:

Heron's claim of 10× Storm performance is based on comparison with Storm 0.8.2, an outdated version; modern Storm 1.0.2 already matches or exceeds that.

Two major performance drawbacks: loss of graph‑level optimizations (no intra‑process communication) and each task running in its own process, forcing network serialization.

Stream Manager becomes a bottleneck (≈500 k QPS per container).

Resource usage is higher per container compared to Storm/JStorm workers.

Resource Utilization: While Heron can precisely allocate resources per task, the per‑task process overhead leads to higher overall consumption, especially in large clusters.

Conclusion: Heron excels in stability and debug‑ability for very large deployments, but its performance and resource efficiency are comparable to or slightly worse than modern Storm/JStorm implementations.

Performancearchitecturebig datastream processingstormbackpressureHeron
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.