Why RocketMQ-Streams Delivers High‑Performance, Low‑Resource Stream Computing
RocketMQ-Streams targets massive data, high‑filtering, lightweight windowed computations with a lightweight, high‑performance design that runs on as little as 1 CPU core and 1 GB RAM, offering 2‑5× speed gains over traditional big‑data engines and supporting Flink‑compatible SQL, UDFs, and cloud‑native deployment.
RocketMQ-Streams is designed for scenarios with massive data volumes, high filtering rates, and lightweight windowed calculations. It can run on a single core with 1 GB memory, achieving 2‑5× performance improvements over other big‑data platforms.
Key Features
Computation Model : Supports exactly‑once semantics, flexible windows (rolling, sliding, session), dual‑stream joins, high throughput, and low latency. It runs with minimal resources (1 Core, 1 GB).
SQL Engine : Compatible with Flink SQL and Flink UDF/UDTF/UDAF extensions. SQL can be hot‑upgraded via SDK submission.
ETL Engine : Built‑in grok and regex parsing; combines SQL‑based extraction and transformation.
Development SDK : Unified source/sink abstractions enable code reuse across different inputs and outputs.
Design Goals and Architecture
Minimal dependencies, simple deployment (1 Core, 1 GB per instance) with easy scalability.
Exactly‑once processing, flexible windows, dual‑stream joins, high throughput, low latency.
Cost‑controlled implementation achieving high performance with low resource usage.
Flink‑compatible SQL and UDF/UDTF for user‑friendly adoption.
The system uses a shared‑nothing distributed architecture, leveraging RocketMQ for load balancing, fault tolerance, and sharding‑based shuffle. State is stored in remote storage for fast startup without waiting for local recovery.
Operators and Execution Model
Operators include source, sink, window, join, split, and generic transformations. The SDK provides a StreamBuilder to create a DataStreamSource, then apply operations such as map, window, join, and toPrint. Tasks can be started synchronously or asynchronously, and multiple instances can run concurrently, each consuming a partition of RocketMQ data.
Exactly‑Once Guarantees
Sources emit checkpoint messages before committing offsets, ensuring at‑least‑once delivery.
Message headers carry queue ID and offset; components store the maximum processed offset to deduplicate repeats.
Memory protection limits cache size and triggers flushes when thresholds are exceeded.
Windowing Capabilities
Supports rolling, sliding, and session windows with event‑time and processing‑time semantics.
High‑performance mode skips remote storage during shard switches (possible data loss); high‑reliability mode uses remote storage.
Fast startup via asynchronous remote state recovery; scaling achieved through queue‑based load balancing.
Cloud Security Use Case
In proprietary cloud environments, traditional big‑data clusters are resource‑intensive for intrusion detection. RocketMQ-Streams filters rules upfront, then performs heavy statistical and join operations only on the reduced dataset, achieving >5× performance with memory usage 1/70 of public‑cloud equivalents and CPU usage 1/6.
Handles 100% of proprietary‑cloud security rules (regex, join, aggregation).
Supports tens of millions of intelligence records with only 330 MB memory for 10 GB data.
Hot‑publishable SQL and engine enable rapid rule deployment.
Resources
Download the latest release: https://github.com/apache/rocketmq-streams/releases/tag/rocketmq-streams-1.0.0-preview
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Native
We publish cloud-native tech news, curate in-depth content, host regular events and live streams, and share Alibaba product and user case studies. Join us to explore and share the cloud-native insights you need.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
