Designing Nexmark: A Standard Benchmark for Stream Processing Performance
This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.
Background
As real‑time data becomes critical for fine‑grained operations, technologies such as Apache Flink, Apache Spark, Kafka ksqlDB and Apache Storm have shaped the stream‑processing landscape. Selecting an appropriate engine requires not only functional comparison but also performance evaluation, for which benchmarking is essential.
Problems with Existing Stream Benchmarks
There is no industry‑standard benchmark for stream processing. The most cited effort, Yahoo Streaming Benchmarks, simulates a simple ad‑click scenario using external services like Kafka and Redis, which become bottlenecks and obscure true engine performance. Moreover, it only covers a basic “Word Count” job, failing to represent complex, real‑world workloads.
Key shortcomings include lack of reproducibility, insufficient coverage of realistic query workloads, inability to scale data volume and distribution, and missing unified performance metrics.
Design Goals for a Standard Benchmark
Reproducibility : Benchmark results must be repeatable without proprietary hardware or hidden services.
Realistic Business Scenarios : The benchmark should reflect industry‑level query mixes, similar to TPC‑H/TPC‑DS for databases.
Configurable Load : Ability to vary data volume, distribution, and skew.
Unified Metrics : Clear definitions for throughput, latency, and resource usage.
Nexmark Benchmark Framework
The Nexmark framework extends the NEXMark research paper and Apache Beam Nexmark Suite. It removes external source and sink dependencies, using an in‑memory datagen source and a null sink so that the engine alone determines performance.
Running all queries is as simple as:
nexmark/bin/run_query.sh allMetrics
Throughput : Measured as records per second generated by the datagen source, e.g., using <source_operator_name>.numRecordsOutPerSecond from Flink REST API.
Latency : Ideally computed as output_system_time - max(ingest_time), though current implementation does not yet expose this metric.
CPU : CPU usage per core is derived from /proc/<pid>/stat, allowing calculation of throughput per CPU core.
Queries and Schema
Nexmark models an online auction system with three streams: Person (users), Auction (items), and Bid (bids). Sixteen ANSI‑SQL queries exercise joins, window aggregations, and other typical stream operations. Currently only Flink SQL is fully supported.
Configurable Workload
The framework allows tuning of datagen throughput, data‑size ratios among streams, average record size, and skew parameters via source DDL.
Experimental Results
Benchmarks were executed on three Alibaba Cloud ecs.i2g.2xlarge instances (8 vCores, 32 GB RAM, 800 GB SSD, 2 Gbps inter‑node bandwidth) using Flink 1.11 in standalone mode with checkpointing and RocksDB state backend. The datagen source produced 10 million events per second with a Bid:Auction:Person ratio of 92 %:6 %:2 %.
Each query ran a 3‑minute warm‑up followed by a 3‑minute measurement period. Results (throughput, CPU, etc.) are shown below:
Conclusion
The Nexmark framework provides a reproducible, extensible benchmark suite for stream processing, currently focused on Flink. It helps drive standardization, supports regression testing, and can compare internal versus open‑source engine versions. Future work includes adding latency metrics and support for additional engines such as Spark Structured Streaming, ksqlDB, and Flink DataStream.
References: Pete Tucker and Kristin Tufte. “NEXMark – A Benchmark for Queries over Data Streams”. June 2010. Jeyhun Karimov and Tilmann Rabl. “Benchmarking Distributed Stream Data Processing Systems”. arXiv:1802.08496v2, Jun 2019. Yangjun Wang. “Stream Processing Systems Benchmark: StreamBench”. May 2016. https://github.com/yahoo/streaming-benchmarks https://www.ververica.com/blog/extending-the-yahoo-streaming-benchmark https://beam.apache.org/documentation/sdks/java/testing/nexmark/
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
