Overview of Open-Source Real-Time Stream Processing Systems
This article provides a concise overview of several open‑source real‑time stream processing platforms—including S4, Storm, StreamBase, HStreaming, Esper/NEsper, Kafka, Scribe, and Flume—highlighting their main features, programming languages, and project links for further reference.
S4
S4 (Simple Scalable Streaming System) is an open‑source stream computing platform released by Yahoo, offering a generic, distributed, highly scalable, partition‑tolerant, and pluggable environment for continuous data processing, with Java as the development language. Project link: http://incubator.apache.org/s4/ (note: S4 0.5.0 adds TCP connections and state recovery).
Storm
Storm, open‑sourced by Twitter, is a distributed real‑time computation system that enables developers to reliably process unbounded streams via a simple API, supporting Java and Clojure (other languages can interact via stdin/stdout using a JSON protocol). Typical use cases include real‑time analytics, online machine learning, continuous computation, distributed RPC, and ETL. Project link: http://storm-project.net .
StreamBase
StreamBase is a commercial complex event processing (CEP) and event‑stream platform that also offers a free Developer Edition; development is done in Java. Project link: http://www.streambase.com .
HStreaming
Built on Hadoop, HStreaming tightly integrates with the Hadoop ecosystem to provide real‑time stream computing services, allowing users to analyze and process big data within the same environment; development language is Java. Project link: http://www.hstreaming.com .
Esper & NEsper
Esper (Java) and NEsper (.NET) are CEP platforms that simplify the development and deployment of applications handling large volumes of historical or real‑time messages and events. Project link: http://esper.codehaus.org .
Kafka
Kafka, open‑sourced by LinkedIn in December 2010, is a high‑throughput, publish‑subscribe distributed messaging system primarily used for handling active streaming data, written in Scala. Project link: http://incubator.apache.org/kafka .
Scribe
Scribe is Facebook’s open‑source log collection system written in C, supporting multiple client languages via Thrift. It aggregates logs from various sources into a central storage (e.g., NFS, distributed file systems) for centralized analysis, often used together with Hadoop for downstream processing. Project link: http://github.com/facebook/scribe .
Flume
Flume, provided by Cloudera, is a distributed, reliable, highly available log collection system for gathering, aggregating, and moving large volumes of log data, implemented in Java. It allows custom data sources and sinks, and can perform simple processing before delivering data to various destinations. Project link: http://incubator.apache.org/flume .
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
