Understanding Kafka Messages, Topics, Partitions, and Consumers
This article explains Kafka's core concepts—including messages as byte arrays, optional keys for partition control, topic and partition organization, producer and consumer roles, offsets, consumer groups, and broker clusters—providing a concise technical overview for developers learning Kafka.
Kafka's data unit is called a message, which can be understood as a record in a relational database. A message consists of a byte array, and Kafka does not interpret its content.
A message may have optional metadata called a key . Like a primary key, the key can be used to control which partition a message is written to.
The simplest example is to generate a consistent hash from the key, mod it by the number of partitions, and select the partition, ensuring messages with the same key always go to the same partition.
Because a message is just a byte array, developers define a message schema to serialize and deserialize it. Common simple schemas include JSON and XML.
Consistent data format is crucial for Kafka as it removes coupling between producers and consumers; otherwise mismatched formats cause chaos.
Messages are categorized by topics , which are similar to database tables or folders. A topic can be split into multiple partitions , each being a commit log. Messages are appended to partitions and read in FIFO order.
Since a topic may have several partitions, global ordering cannot be guaranteed, but ordering is preserved within a single partition.
Kafka clients are divided into two basic types: producers and consumers. Producers create messages and publish them to a specific topic, usually distributing them across all partitions, but can target a specific partition using a key and partitioner.
Consumers subscribe to one or more topics and read messages in order, using offsets to track what has been consumed.
Offset is a monotonically increasing integer added to each message within a partition; consumers store the next offset to resume after a restart.
Multiple consumers can form a consumer group , where each partition is read by only one consumer in the group.
A single Kafka server is called a broker. Brokers receive, persist, and serve messages to consumers. A broker can handle thousands of partitions and millions of messages per second.
Brokers form a cluster with one broker acting as the controller (leader). The leader assigns partitions and monitors brokers. Each partition has a leader broker and follower replicas.
Conclusion: This is a personal technical note on Kafka basics; errors are welcome for correction.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.