Kafka Architecture and Core Concepts: Producers, Brokers, and Consumers
This article explains Kafka's fundamental architecture, including the roles of producers, brokers, and consumers, key concepts such as topics, partitions, replicas, ISR, and controller, as well as detailed mechanisms of producer client structure, interceptors, serializers, partitioners, and consumer group rebalancing strategies.
Kafka is a distributed streaming platform whose architecture consists of multiple producers, brokers, and consumers, coordinated by a ZooKeeper cluster. Topics are logical containers that are split into partitions, each replicated across brokers for fault tolerance. Replicas follow a leader‑follower model, and the set of in‑sync replicas (ISR) represents the replicas that are sufficiently up‑to‑date.
Two important offsets are the High Watermark (HW), which marks the highest offset that all ISR replicas have persisted, and the Log End Offset (LEO), the offset of the next message to be written.
Producer Client Structure
The producer runs two threads: the main thread creates ProducerRecord objects and buffers them in a RecordAccumulator, while a dedicated Sender thread pulls batches from the accumulator and sends them to the appropriate broker leader.
Before sending, the producer may invoke interceptors (implementing org.apache.kafka.clients.producer.ProducerInterceptor) via onSend, onAcknowledgement, and close methods. Interceptors can filter or modify records and perform lightweight statistics.
public ProducerRecord<K, V> onSend(ProducerRecord<K, V> record);
public void onAcknowledgement(RecordMetadata metadata, Exception exception);
public void close();Serializers (implementing org.apache.kafka.common.serialization.Serializer) convert objects to byte arrays. The required methods are configure, serialize, and close. An example StringSerializer is shown below.
public class StringSerializer implements Serializer<String> {
private String encoding = "UTF8";
@Override
public void configure(Map<String, ?> configs, boolean isKey) { /* ... */ }
@Override
public byte[] serialize(String topic, String data) { /* ... */ }
@Override
public void close() { /* nothing to do */ }
}The default partitioner (
org.apache.kafka.clients.producer.internals.DefaultPartitioner) implements the Partitioner interface with partition and close methods, assigning partitions based on key hash or round‑robin when the key is null.
public int partition(String topic, Object key, byte[] keyBytes,
Object value, byte[] valueBytes, Cluster cluster);
public void close();Broker Request Processing
Each broker runs a SocketServer that accepts client connections. An Acceptor thread distributes inbound requests to a pool of network threads (default 3). Network threads enqueue requests for an I/O thread pool (default 8) which performs the actual logic, such as writing produce requests to the log or fetching messages.
The Purgatory component holds delayed requests, e.g., a produce request with acks=all waits until all ISR replicas acknowledge.
Controller
One broker is elected as the controller by creating the /controller znode in ZooKeeper. The controller manages cluster metadata, topic creation, partition reassignment, preferred leader election, and broker membership changes. Its epoch ( controller_epoch) increments on each election to guarantee uniqueness.
Consumer Groups
Consumers belong to a consumer group; each partition is consumed by only one member of the group. Group coordination is handled by the GroupCoordinator, which performs rebalancing when membership, subscription, or partition counts change.
Kafka provides several partition assignment strategies:
RangeAssignor : divides partitions evenly based on sorted consumer IDs.
RoundRobinAssignor : assigns partitions in a round‑robin fashion across consumers.
StickyAssignor : aims for balanced distribution while preserving previous assignments to reduce movement.
Rebalancing consists of a JoinGroup phase (members send subscriptions), leader election, strategy selection, and a SyncGroup phase where the leader distributes the final assignment.
Key consumer configuration parameters affecting rebalancing are session.timeout.ms (default 10 s), heartbeat.interval.ms, and max.poll.interval.ms (default 5 min). Tuning these values can prevent unnecessary rebalances caused by missed heartbeats or long processing times.
The consumer group state machine includes states Empty, Dead, PreparingRebalance, CompletingRebalance, and Stable, with transitions driven by member joins/leaves and coordinator actions.
Overall, the article provides a comprehensive overview of Kafka's core components, data flow, and client‑side mechanisms essential for building reliable streaming applications.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
