Overview of Open-Source Real-Time Stream Processing Systems
This article provides a concise overview of several open‑source real‑time stream processing platforms—including S4, Storm, StreamBase, HStreaming, Esper/NEsper, Kafka, Scribe, and Flume—highlighting their primary features, programming languages, and project links for future technical research.
S4
S4 (Simple Scalable Streaming System) is an open‑source stream‑processing platform released by Yahoo. It is a general‑purpose, distributed, highly scalable system with partition fault‑tolerance and plugin support, allowing developers to build unbounded, continuous stream‑processing applications in Java.
Project link: http://incubator.apache.org/s4/ (Note: S4 0.5.0 adds TCP connectivity and state recovery features).
Storm
Storm, open‑sourced by Twitter, is a distributed real‑time computation system. Its simple API lets developers reliably process unbounded streams for use cases such as real‑time analytics, online machine learning, continuous computation, distributed RPC, and ETL. Development languages are Clojure and Java; other languages can interact via stdin/stdout using a JSON protocol.
Project link: http://storm-project.net
StreamBase
StreamBase is a Complex Event Processing (CEP) and event‑stream platform. Although it is commercial software, a Developer Edition is available, and applications are written in Java.
Project link: http://www.streambase.com
HStreaming
HStreaming is built on Hadoop and tightly integrates with the Hadoop ecosystem to provide real‑time stream processing services, enabling users to analyze and process big data within the same environment. Development language is Java.
Project link: http://www.hstreaming.com
Esper & NEsper
Esper (Java) and NEsper (.NET) are CEP platforms that allow developers to quickly develop and deploy applications handling large volumes of messages and events, both historical and real‑time.
Project link: http://esper.codehaus.org
Kafka
Kafka, open‑sourced by LinkedIn in December 2010, is a high‑throughput, publish‑subscribe distributed messaging system primarily used for handling active streaming data. It is written in Scala.
Project link: http://incubator.apache.org/kafka
Scribe
Scribe is Facebook’s open‑source log‑collection system written in C. Using Thrift, it supports many client languages. It aggregates logs from various sources into a central storage system (e.g., NFS, distributed file systems) for centralized analysis. Frequently paired with Hadoop, Scribe pushes logs to HDFS while Hadoop processes them via MapReduce.
Project link: http://github.com/facebook/scribe
Flume
Flume, provided by Cloudera, is a distributed, reliable, highly available log‑collection system for gathering, aggregating, and moving large volumes of log data. It is written in Java and allows custom data sources and sinks, as well as simple data transformations before delivery.
Project link: http://incubator.apache.org/flume
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
