Big Data 13 min read

Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Based on data transmission and reliability metrics, this article compares Apache Storm and Apache Flink in stream processing, presenting benchmark designs, test environments, results for synthetic and Kafka data, and offers practical recommendations such as operator chaining, object reuse, and checkpoint strategies to maximize Flink performance.

360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
360 Zhihui Cloud Developer
Why Flink Outperforms Storm: Deep Dive into Stream Processing Performance

Introduction

This article, derived from a talk at the Flink China meetup, compares the performance of Apache Storm and Apache Flink for stream processing, focusing on data transmission and reliability.

Performance Test Case Design

We surveyed existing big‑data platform benchmarks (e.g., Yahoo Streaming‑benchmarks, Intel HiBench) and identified two key metrics: throughput (data processed per unit time) and latency (time to process a single record). The test cases emphasize data transmission time, assuming sufficient resources to avoid queuing.

Computation Model

A simple stream model is used: a Source emits data to downstream Tasks, which process and output results. To isolate transmission performance, no computation logic is added inside the Tasks.

Data Sources

Two sources are considered: (1) in‑process synthetic data generation, and (2) data read from Kafka, reflecting the company’s real‑world workload.

Transmission Methods

Two transmission modes are evaluated:

Inter‑process transmission : data is serialized, sent over the network, and deserialized. Flink uses Netty; Storm originally used ZeroMQ, now also Netty.

Intra‑process transmission : operators run in the same process; Flink chains operators and passes objects by reference, while Storm uses a shared queue between threads.

Reliability Mechanisms

Storm relies on an ACK mechanism; Flink uses checkpointing based on the Chandy‑Lamport algorithm. Both systems were tested with at‑least‑once semantics and without reliability guarantees.

Test Environment

The hardware and software versions of the test cluster are shown in the following diagram:

Results – Synthetic Data

When using in‑process transmission, Flink’s throughput is 3.8× higher than inter‑process transmission. Enabling checkpointing has little impact on throughput. The best configuration is in‑process transmission without reliability guarantees, achieving up to 20.6 million events/s; enabling object reuse raises this to 40.9 million events/s.

Flink Data Transmission Model

Operator chaining places multiple operators in the same Task, allowing direct object passing (deep copy) instead of serialization/network transfer.

Operator Chaining

Chaining can be disabled with env.disableOperatorChaining() . When enabled, Source and map functions run in the same Task; otherwise they run in separate Tasks, incurring serialization and network transfer.

Conditions for Chaining

Operators can be chained only if:

The downstream operator accepts a single upstream stream (no union).

Upstream and downstream parallelism are equal.

Both belong to the same resource group (default is default ).

Chaining is not disabled in the execution environment.

Data is forwarded without rebalance, keyBy, broadcast, etc.

Object Reuse

Calling env.getConfig().enableObjectReuse() removes deep‑copy steps, allowing the Source to pass the same object to the map function. This boosts throughput but requires that downstream functions do not modify the shared object.

Reliability Impact

Enabling reliability has a minimal effect on Flink’s throughput but dramatically reduces Storm’s performance. With reliability, Flink’s single‑thread throughput is ~15× Storm’s; without reliability, Flink is ~66× faster.

Checkpoint vs. ACK Overhead

Storm’s ACK mechanism generates three acknowledgment messages per data record in a simple topology, while Flink’s checkpointing sends a single control message per checkpoint interval, resulting in far lower overhead.

Results – Kafka Source

When consuming from Kafka, Flink’s bottleneck shifts to the upstream data‑to‑downstream transmission, while Storm’s bottleneck is downstream deserialization. Increasing Kafka partitions and source parallelism improves Flink’s throughput, but Storm remains limited by its ACK overhead.

Conclusion

The analysis shows that, for stream processing workloads, Flink significantly outperforms Storm in both throughput and latency, especially when using in‑process transmission, operator chaining, and object reuse. Future work will involve building an intelligent analysis platform to automatically profile jobs, identify bottlenecks, and suggest optimizations.

big dataFlinkStream ProcessingPerformance Testingstormoperator chaining
360 Zhihui Cloud Developer
Written by

360 Zhihui Cloud Developer

360 Zhihui Cloud is an enterprise open service platform that aims to "aggregate data value and empower an intelligent future," leveraging 360's extensive product and technology resources to deliver platform services to customers.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.