Event Sourcing, CQRS, and Stream Processing with Apache Kafka
This article explains how event sourcing models state changes as immutable logs, discusses the trade‑offs of the pattern, and shows how Apache Kafka and Kafka Streams enable CQRS and interactive queries for building scalable, resilient backend applications.
Event Sourcing: Some Trade‑offs
Event sourcing is an increasingly popular architectural pattern that models every state change as an immutable event stored in a log rather than updating the state directly.
For example, in a Facebook‑like social network, a profile update would generate an event that multiple downstream services (search indexing, news feed, data‑warehouse ETL) can consume.
Event‑sourcing architecture
Profile updates are written to a central log such as a Kafka topic; any service that needs to react simply subscribes to the topic and builds its own materialized view (e.g., a cache, an Elasticsearch index, or an in‑memory aggregation).
Event Sourcing: Some Trade‑offs
Advantages include a complete audit trail of every change, easier debugging, built‑in audit/compliance logs, resilience (rollback by replaying the log), independent write/read scaling, and loose coupling that eases migration to micro‑service architectures.
Event sourcing supports building forward‑compatible applications, allowing new services to consume the same events without breaking existing ones.
Drawbacks are a steeper learning curve, the need for a new programming model, and additional query work because the raw event log must be transformed into queryable state.
This is a brief introduction; the article does not aim to cover every detail of event sourcing.
Kafka as the Pillar of Event Sourcing
Kafka provides a high‑performance, low‑latency, durable log that many companies use at scale, making it a natural backbone for storing events in an event‑sourced system.
Event Sourcing and CQRS
CQRS (Command Query Responsibility Segregation) pairs naturally with event sourcing. Commands update state by writing events; queries read materialized views built from those events, enabling optimized read paths and clear separation of concerns.
Using event sourcing and CQRS to refactor an application
CQRS and Kafka Streams API
Kafka Streams lets you embed a stream‑processing library inside any Java application to model event handlers and build materialized views.
Example code for a word‑count topology:
KStreamBuilder builder = new KStreamBuilder();
KStream
textLines = builder.stream(stringSerde, stringSerde,"TextLinesTopic");
Pattern pattern = Pattern.compile("\\W+", Pattern.UNICODE_CHARACTER_CLASS);
KStream
wordCounts = textLines
.flatMapValues(value -> Arrays.asList(pattern.split(value.toLowerCase())))
.map((key, word) -> new KeyValue<>(word, word))
.countByKey("Counts")
.toStream();
wordCounts.to(stringSerde, longSerde, "WordsWithCountsTopic");
KafkaStreams streams = new KafkaStreams(builder, streamsConfiguration);
streams.start();Two options exist for persisting the results of a Kafka Streams topology.
Approach 1: Model Application State in an External Data Store
In this model the Streams topology writes its output to external systems such as relational databases, allowing the state to be queried outside the Streams application.
Approach 2: Model Application State as Local State in Kafka Streams
Kafka Streams also provides built‑in, partitioned, persistent local state (e.g., RocksDB or in‑memory hash maps). Each instance hosts a shard of the state store, and updates are replicated to a changelog topic for fault tolerance.
Kafka uses its own topic as the commit log for the embedded local database, just like traditional databases use a transaction log.
Local state offers lower latency, fewer moving parts, better isolation, and the ability to tailor the store to the application's query patterns.
Applications using Kafka Streams do not need to worry about fault‑tolerance, availability, or scaling of the state store—it is handled transparently.
The embedded, partitioned, persistent state is exposed to the application via the KTable abstraction.
Interactive Queries in Kafka Streams
Future Kafka releases will allow the embedded state store to be queried directly (formerly known as Queryable State), making it a perfect fit for CQRS‑style architectures.
Read‑side services can use the StateStore API’s get() method to serve queries against the materialized view.
Interactive Queries in Kafka Streams: Cases
Using interactive queries is optional; some applications may still prefer an external database for durability or operational reasons.
Pros of local state:
Fewer moving parts – only the application and Kafka cluster.
Low‑latency access because the state is local (in‑memory or SSD).
Better isolation – a malicious app cannot flood a shared central store.
Flexibility to optimize the store for the specific query pattern.
Cons:
The application becomes stateful and requires careful management.
It moves state away from trusted external databases.
Using Kafka for Event Sourcing and CQRS: The Big Winner
Combining event sourcing with CQRS on Kafka simplifies non‑stop upgrades: new instances can be deployed with a different application ID, process the same log, and traffic can be shifted gradually without risking state corruption.
Putting It Together: Retail Inventory Application
An example shows how a retail inventory service can ingest Shipments and Sales events from Kafka topics, join them in a Streams topology, and maintain an InventoryTable that reflects the current stock per store and item.
When a client queries the inventory, the request is routed to the instance that holds the relevant partition using the metadataForKey() API; if the local instance does not own the partition, the request is forwarded to the correct host.
Interactive queries can serve the current inventory directly from the local state, or the result can be written to an external database for downstream consumption.
Concluding Thoughts
Event sourcing provides a loss‑less log of all state changes, making recovery simple and efficient; CQRS turns those raw events into queryable views; and Kafka Streams supplies the declarative, scalable processing and query layer needed to build robust, forward‑compatible backend systems.
Architects Research Society
A daily treasure trove for architects, expanding your view and depth. We share enterprise, business, application, data, technology, and security architecture, discuss frameworks, planning, governance, standards, and implementation, and explore emerging styles such as microservices, event‑driven, micro‑frontend, big data, data warehousing, IoT, and AI architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.