Big Data 7 min read

Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

This article explains Kafka's evolution—highlighting the introduction of Kafka Streams, the shift to a full distributed stream processing platform, practical learning paths, source‑code reading tips, common pitfalls, and the major new features introduced in Kafka 3.0.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Understanding Kafka: From Message Engine to Distributed Stream Processing Platform

Strictly speaking, this article was not written today; it is compiled from several earlier posts that discuss Kafka's evolution.

On September 21, 2021, with the release of Kafka 3.0, the effort to make Kafka a distributed stream processing platform was further strengthened. Kafka no longer limits itself to a message engine; based on this new positioning, the Kafka community introduced the stream processing component Kafka Streams in version 0.10.0.0, marking Kafka’s transformation into a distributed stream processing platform rather than just a messaging system.

Kafka is not only a message engine system, but also a distributed stream processing platform.

In certain scenarios, you can replace engines such as Flink or Spark with Kafka Streams to achieve data processing more easily.

So, what should we focus on when learning Kafka?

What are we actually learning when we study Kafka?

The article " What Are We Actually Learning When We Study Kafka? " summarizes the overall learning method and path, covering background, core concepts, core principles, source code reading, and practical applications. Readers should choose the part that best matches their own situation.

This part especially emphasizes the Kafka Stream module. The emergence of Kafka Stream changes Kafka’s positioning from a distributed, partitioned, replicated commit‑log service to a complete distributed messaging engine and stream‑processing engine.

Some Tips for Reading Kafka Source Code

If you already have a basic understanding of Kafka and have used it in simple applications, an indispensable step is source code reading.

Kafka’s codebase exceeds 500,000 lines, making it impossible to read it entirely; instead, focus on the most important parts. The article " Some Tips for Reading Kafka Source Code " provides a fairly complete outline for source‑code exploration.

Kafka source code outline
Kafka source code outline

For reference.

Kafka Common Error Collection

The final section lists a collection of common Kafka errors for reference:

30 Common Kafka Errors Collection

Kafka 3.0 Arrives

The last part highlights important updates in Kafka 3.0.

Apache Kafka 3.0 is a major release that introduces many new features, breakthrough API changes, and improvements to KRaft—the built‑in consensus mechanism that will replace Apache ZooKeeper.

Although KRaft is not yet recommended for production, numerous enhancements have been made to its metadata and APIs. Exactly‑once semantics and partition reassignment support are noteworthy. Users are encouraged to explore KRaft’s new capabilities in a development environment.

Starting with Kafka 3.0, producers default to the strongest delivery guarantees (acks=all, enable.idempotence=true), providing ordering and durability out of the box.

Kafka Connect task restart improvements, KStreams timestamp‑synchronised processing, and more flexible MirrorMaker 2 configuration options are also included.

Additionally, Java 8 support is dropped, requiring an upgrade of the JDK version.

More detailed information can be found online.

---

Hi, I am Wang Zhiwu, a hardcore original author in the big‑data field. I have worked on backend architecture, data middleware, data platforms & architecture, and algorithm engineering. I focus on real‑time big‑data dynamics, technical improvement, personal growth, and career advancement. Feel free to follow.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Distributed SystemsBig Datastream processingKafkakafka streamssource code reading
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.