Big Data 11 min read

Understanding Stream Processing, Event Sourcing, and Complex Event Processing

The article explains the fundamentals of stream processing, event sourcing, and complex event processing, comparing raw event storage with aggregated results, illustrating architectures with Kafka, Samza, and other frameworks, and highlighting benefits such as scalability, flexibility, and decoupling for modern data‑driven systems.

Art of Distributed System Architecture Design

Apr 15, 2015

Understanding Stream Processing, Event Sourcing, and Complex Event Processing

Different terms such as stream processing, event sourcing, CQRS, and Complex Event Processing (CEP) describe similar concepts of handling immutable events; this article follows Martin Kleppmann’s perspective to deepen understanding for better system design.

Stream processing originated from LinkedIn’s large‑scale data systems and is realized in open‑source projects Apache Kafka and Apache Samza; the article uses Google Analytics page‑view events as an illustrative example.

Each page‑view event is a simple immutable fact that records what happened, and these events can be stored in various ways.

Option (a) stores every incoming event in a large database, data warehouse, or Hadoop cluster, allowing queries that scan all events or large subsets for dynamic aggregation.

Option (b) stores aggregated results, such as incrementing a counter for each event or keeping multiple counters in an OLAP cube, enabling fast reads of pre‑computed values without scanning full event logs.

Storing raw events (option a) maximizes analytical flexibility, useful for offline tasks like training recommendation systems, whereas aggregated storage (option b) is advantageous for real‑time decisions such as rate‑limiting, where maintaining per‑IP counters is more efficient.

For simple aggregated updates, counters can be kept in caches like memcached or Redis with atomic increments; more complex setups may introduce an event stream, message queue, or event log, where the stream events share the same structure as the original PageViewEvent.

This architecture allows multiple consumers to process the same event data for different tasks, providing flexibility and easy scalability.

Event sourcing, a concept from domain‑driven design, focuses on how data is stored in databases; using an e‑commerce shopping‑cart example, the article shows that instead of destructive UPDATE statements, each change should be recorded as an immutable event (e.g., AddToCart, UpdateCartQuantity).

Both stream processing and event sourcing share two storage strategies: (a) raw immutable events for ideal write performance (append‑only) and (b) aggregated results for ideal read performance.

The article notes that many databases already implement an immutable write‑ahead log, which is essentially an event stream, with implementations varying across PostgreSQL, InnoDB, Oracle MVCC, CouchDB, Datomic, and LMDB.

At the application level, Martin demonstrates using Apache Kafka as a durable publish‑subscribe message broker and Apache Samza as the processing engine; other popular stream‑processing frameworks such as Storm and Spark Streaming are also mentioned.

Higher‑level stream languages like CEP enable writing queries or rules that continuously match patterns in the event stream, useful for fraud detection or business‑process monitoring.

Related concepts include full‑text search on streams, actor frameworks (Akka, Orleans, Erlang OTP) that are built on immutable event streams, reactive programming models, and change‑data‑capture (CDC) that extracts insert, update, and delete operations into an event stream.

Loose coupling – separate read and write models reduce inter‑component dependencies.

Read/write performance – append‑only writes are fast, while pre‑aggregated reads are fast.

Scalability – the simple abstraction of event streams enables parallel processing across machines.

Flexibility – immutable raw events simplify schema migrations; transformed caches can be rebuilt for new UI needs.

Error handling – immutable events can be replayed to recover from failures.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data stream processing Event Sourcing Apache Kafka Event-Driven Architecture Apache Samza

Written by

Art of Distributed System Architecture Design

Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.