Event Sourcing, CQRS, and Stream Processing with Apache Kafka
Event sourcing models state changes as immutable logs, and when combined with CQRS and Kafka Streams, it enables scalable, fault‑tolerant architectures where write and read paths are decoupled, supporting local or external state stores, interactive queries, and zero‑downtime upgrades.
Event sourcing is becoming a popular application‑architecture pattern. It models state changes as an immutable sequence of events (a "log") rather than mutating state directly; the events are stored in an immutable log and the current state is derived by reacting to those events. This article explores how stream processing—especially Kafka Streams—helps realize event sourcing and CQRS in practice.
Consider a hypothetical Facebook‑like social network. When a user updates their profile, the profile database is updated, and multiple downstream applications (search indexing, news‑feed generation, data‑warehouse ETL, etc.) need to be notified of the change.
Architecture based on event sourcing
In this model, profile‑update events are written to a central log (e.g., a Kafka topic). All applications that need to react to profile updates simply subscribe to that topic and build their own materialized views—caching, Elasticsearch indexing, in‑memory aggregates, etc. The profile service itself also consumes the same topic to persist the update to its database.
Event Sourcing: Some Trade‑offs
Event sourcing offers many advantages: a complete audit log of every state change, easier troubleshooting, built‑in audit/compliance, support for resilient applications (rollback by replaying the log), independent scaling of writes and reads, and loose coupling between services, which eases migration to micro‑service architectures. Most importantly, it enables forward‑compatible designs where new services can be added later to consume the same events.
Event sourcing supports building forward‑compatible architectures, allowing future applications to process the same events and create new materialized views.
However, there are drawbacks: a steeper learning curve for the new programming model, and more effort required to query the event log because the desired state must be materialized from events.
Kafka as the Pillar of Event Sourcing
Kafka provides a high‑performance, low‑latency, scalable, and durable log that many companies use in production. Its characteristics make it a natural backbone for storing events and driving event‑sourced applications.
Event Sourcing and CQRS
CQRS (Command‑Query Responsibility Segregation) is often paired with event sourcing. Commands update state, while queries read state without modifying it. CQRS separates write‑side concerns (business logic) from read‑side concerns (query performance, different materialized views).
Refactoring an application with event sourcing and CQRS
The write side writes events to a Kafka topic; an event handler (a Kafka Streams topology) consumes those events, transforms them, and writes materialized views to a read store. The read side then queries that store.
CQRS and the Kafka Streams API
Kafka Streams enables building the event‑handler component inside any standard Java application. Below is a code snippet that counts words using Kafka Streams.
KStreamBuilder builder = new KStreamBuilder();
KStream<String, String> textLines = builder.stream(stringSerde, stringSerde,"TextLinesTopic");
Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS);
KStream<String, Long> wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.map((key, word) -> new KeyValue<>(word, word))
.countByKey("Counts")
.toStream();
wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");
KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.start();Two modeling options exist for the output of a Kafka Streams topology:
Approach 1: Model Application State in an External Data Store
Here the Kafka Streams topology writes its results to external storage (e.g., a relational database). The write side is modeled with Kafka Streams, while the read side uses the trusted external store.
Approach 2: Model Application State as Local State in Kafka Streams
Kafka Streams also provides built‑in local, partitioned, persistent state stores (e.g., RocksDB or in‑memory hash maps). Each instance of the application hosts a subset of the state store partitions, and Kafka Streams records all updates to a highly available Kafka topic, enabling fault‑tolerant recovery.
In practice, Kafka Streams uses Kafka as the commit log for its embedded local database, mirroring the design of traditional databases where the transaction log is the source of truth.
Interactive Queries in Kafka Streams
Future Kafka releases will allow the embedded state store to be queried directly (formerly called Queryable State). This feature makes Kafka Streams well‑suited for CQRS: the write side builds the state store, and the read side uses the StateStore API’s get() method to serve queries.
Interactive queries are optional; some applications may still prefer an external database for reads.
Cons : introduces stateful instances that require careful management; moves state away from a trusted external store.
Pros : fewer moving parts (only the app and Kafka cluster); lower latency access to local state; better isolation; flexible optimization for query patterns.
Kafka for Event Sourcing and CQRS: The Big Winner
The combination of event sourcing, CQRS, and Kafka Streams yields several benefits: zero‑downtime upgrades become simpler because each version can maintain its own state store and replay from the log if needed, and the architecture leverages Kafka’s performance, scalability, security, and reliability.
Putting It Together: Retail Inventory Application
Imagine a retail inventory service where shipments and sales are represented as events on Kafka topics. A Kafka Streams topology joins the Shipments and Sales topics to maintain an InventoryTable that reflects the current stock per store and item.
Multiple instances of the inventory service each host a shard of the InventoryTable . When a client queries the inventory, the request is routed to the instance that holds the relevant partition using the metadataForKey() API; if the local instance does not own the partition, the request is forwarded to the correct host.
For more details on interactive queries, consult the Kafka Streams documentation and the Capital One demo that showcases a REST‑based, event‑sourced, CQRS application built with Kafka Streams.
Sometimes storing state externally makes sense (e.g., when you need a trusted database). The choice depends on trade‑offs such as latency, operational complexity, and query patterns.
Conclusion
Event sourcing provides an immutable log of state changes, simplifying recovery and audit. CQRS turns those events into queryable views. Kafka Streams offers declarative, scalable processing and interactive query capabilities, enabling robust, forward‑compatible, and easily upgradable architectures on top of Apache Kafka.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.