Big Data 14 min read

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

This article examines the challenges of existing stream‑processing benchmarks, introduces the open‑source Nexmark framework designed for reproducible, comprehensive performance testing, describes its metrics, query set, workload configurability, and presents experimental results on Flink, highlighting its role in advancing big‑data stream benchmarking.

Alibaba Cloud Developer

Sep 15, 2020

Designing Nexmark: A Standard Benchmark for Stream Processing Performance

Background

As real‑time data becomes critical for fine‑grained operations, technologies such as Apache Flink, Apache Spark, Kafka ksqlDB and Apache Storm have shaped the stream‑processing landscape. Selecting an appropriate engine requires not only functional comparison but also performance evaluation, for which benchmarking is essential.

Problems with Existing Stream Benchmarks

There is no industry‑standard benchmark for stream processing. The most cited effort, Yahoo Streaming Benchmarks, simulates a simple ad‑click scenario using external services like Kafka and Redis, which become bottlenecks and obscure true engine performance. Moreover, it only covers a basic “Word Count” job, failing to represent complex, real‑world workloads.

Key shortcomings include lack of reproducibility, insufficient coverage of realistic query workloads, inability to scale data volume and distribution, and missing unified performance metrics.

Design Goals for a Standard Benchmark

Reproducibility : Benchmark results must be repeatable without proprietary hardware or hidden services.

Realistic Business Scenarios : The benchmark should reflect industry‑level query mixes, similar to TPC‑H/TPC‑DS for databases.

Configurable Load : Ability to vary data volume, distribution, and skew.

Unified Metrics : Clear definitions for throughput, latency, and resource usage.

Nexmark Benchmark Framework

The Nexmark framework extends the NEXMark research paper and Apache Beam Nexmark Suite. It removes external source and sink dependencies, using an in‑memory datagen source and a null sink so that the engine alone determines performance.

Running all queries is as simple as:

nexmark/bin/run_query.sh all

Metrics

Throughput : Measured as records per second generated by the datagen source, e.g., using <source_operator_name>.numRecordsOutPerSecond from Flink REST API.

Latency : Ideally computed as output_system_time - max(ingest_time), though current implementation does not yet expose this metric.

CPU : CPU usage per core is derived from /proc/<pid>/stat, allowing calculation of throughput per CPU core.

Queries and Schema

Nexmark models an online auction system with three streams: Person (users), Auction (items), and Bid (bids). Sixteen ANSI‑SQL queries exercise joins, window aggregations, and other typical stream operations. Currently only Flink SQL is fully supported.

Configurable Workload

The framework allows tuning of datagen throughput, data‑size ratios among streams, average record size, and skew parameters via source DDL.

Experimental Results

Benchmarks were executed on three Alibaba Cloud ecs.i2g.2xlarge instances (8 vCores, 32 GB RAM, 800 GB SSD, 2 Gbps inter‑node bandwidth) using Flink 1.11 in standalone mode with checkpointing and RocksDB state backend. The datagen source produced 10 million events per second with a Bid:Auction:Person ratio of 92 %:6 %:2 %.

Each query ran a 3‑minute warm‑up followed by a 3‑minute measurement period. Results (throughput, CPU, etc.) are shown below:

Conclusion

The Nexmark framework provides a reproducible, extensible benchmark suite for stream processing, currently focused on Flink. It helps drive standardization, supports regression testing, and can compare internal versus open‑source engine versions. Future work includes adding latency metrics and support for additional engines such as Spark Structured Streaming, ksqlDB, and Flink DataStream.

References: Pete Tucker and Kristin Tufte. “NEXMark – A Benchmark for Queries over Data Streams”. June 2010. Jeyhun Karimov and Tilmann Rabl. “Benchmarking Distributed Stream Data Processing Systems”. arXiv:1802.08496v2, Jun 2019. Yangjun Wang. “Stream Processing Systems Benchmark: StreamBench”. May 2016. https://github.com/yahoo/streaming-benchmarks https://www.ververica.com/blog/extending-the-yahoo-streaming-benchmark https://beam.apache.org/documentation/sdks/java/testing/nexmark/

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink stream processing Latency CPU Throughput benchmark Nexmark

Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.