Performance Comparison of Apache Storm and Apache Flink from Data Transmission and Reliability Perspectives
This article presents a detailed performance benchmark comparing Apache Storm and Apache Flink in stream processing, focusing on data transmission methods, reliability mechanisms, operator chaining, and both self‑generated and Kafka‑sourced workloads, and provides practical optimization recommendations based on the results.
The article, authored by Zhang Xinyu, a lead of the 360 Big Data Computing Platform, summarizes a talk given at the Flink China Meetup and originally published on the High‑Availability Architecture public account.
It compares the streaming performance of Apache Storm and Apache Flink, two open‑source distributed computation engines, by designing benchmark cases that measure throughput and latency, the two common metrics used in big‑data platform evaluations.
Test cases include self‑generated data streams and Kafka‑sourced streams; the benchmark isolates data transmission cost by omitting any user‑defined computation logic in the Task stage, allowing the evaluation to focus on inter‑process versus intra‑process data transfer.
Inter‑process transmission involves serialization, network transfer, and deserialization (Netty in Flink, ZeroMQ/Netty in Storm), while intra‑process transmission uses direct method calls after a shallow copy; reliability is ensured by Storm’s ACK mechanism and Flink’s checkpointing (derived from the Chandy‑Lamport algorithm), with both at‑least‑once and exactly‑once semantics examined.
The testing environment and software versions are documented with screenshots, ensuring reproducibility of the results.
Results for self‑generated data show that intra‑process transmission is about 3.8× faster than inter‑process, and enabling Flink’s checkpointing has little impact on throughput; chaining operators (placing them in the same Task) yields the highest performance, with Flink achieving up to 20.6 M events/s (15× Storm) and 40.9 M events/s when object reuse is enabled.
Operator chaining requires compatible downstream data types, identical parallelism, shared resource groups, no disabling of chaining in the execution environment, and forward partitioning (no rebalance, keyBy, broadcast, etc.).
When consuming Kafka data, Flink’s bottleneck shifts to the source side, while Storm’s bottleneck appears in downstream deserialization; nevertheless, Flink still outperforms Storm by large margins.
The article concludes that while the current analysis isolates transmission and reliability, real‑world workloads will also involve CPU and memory constraints, and future work will build an intelligent analysis platform to automatically diagnose job bottlenecks and suggest optimizations.
360 Tech Engineering
Official tech channel of 360, building the most professional technology aggregation platform for the brand.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.