Big Data 18 min read

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

This article provides a comprehensive overview of Kafka, covering its overall architecture, key components such as brokers, producers, consumers, topics, partitions, replicas, and ZooKeeper, as well as logical and physical storage mechanisms, producer and consumer workflows, configuration parameters, partition assignment strategies, rebalancing, and the replication model that ensures data reliability.

Sanyou's Java Diary

Sep 21, 2023

Understanding Kafka: Core Concepts, Architecture, and Reliability Explained

Kafka Overall Architecture

Kafka decouples systems, smooths traffic spikes, and enables asynchronous communication, making it ideal for activity tracking, messaging, metrics, logging, and stream processing.

Key Components

Broker : A Kafka instance; multiple brokers form a cluster.

Producer : Writes messages to brokers.

Consumer : Reads messages from brokers.

Consumer Group : One or more consumers that share a subscription to a topic.

ZooKeeper : Manages cluster metadata and controller election.

Topic : Logical categorization of messages.

Partition : Subdivision of a topic for scalability and fault tolerance.

Replica : Copies of a partition for durability.

Leader and Follower : Leader handles reads/writes; followers replicate the leader.

Offset : Unique position of a message within a partition.

Logical Storage Model

Kafka stores data as an append‑only log, improving write performance. Each partition consists of multiple log segments, enabling efficient cleanup. Offsets guarantee ordering within a partition but not across partitions.

Writing Data (Producer Flow)

The producer workflow includes creating a record, applying interceptors, serialization, partition selection, batching in RecordAccumulator, sending requests, handling back‑pressure, and cleaning up resources. Important parameters include buffer.memory, batch.size, linger.ms, and max.block.ms.

Send Modes

Fire‑and‑forget (lowest latency, lowest reliability).

Sync (wait for broker acknowledgment, highest reliability).

Async (callback after acknowledgment).

Acknowledgment Settings (acks)

acks=1

(default): Leader acknowledgment. acks=0: No acknowledgment, possible data loss. acks=all or -1: All in‑sync replicas must acknowledge.

Reading Data (Consumer Flow)

Consumers use a pull‑based model, repeatedly invoking poll() to fetch records.

while (true) {</code><code>    records := consumer.poll();</code><code>    for record := range records {</code><code>        // process record</code><code>    }</code><code>}

Offset Commit

After processing, consumers commit the next offset (e.g., 9528 after processing up to 9527). Automatic commits can cause duplicate processing or data loss if a consumer crashes before committing.

Partition Assignment Strategies

Range: Assigns contiguous partitions per consumer.

RoundRobin: Distributes partitions evenly in a round‑robin fashion.

Sticky: Tries to keep previous assignments while balancing load.

Rebalancing

Triggered by consumer joins/leaves, group coordinator changes, or topic/partition count changes. Steps: FindCoordinator → JoinGroup → SyncGroup → Heartbeat.

Physical Storage

Log Files and Segments

Data is stored in append‑only log files split into segments. Retention policies include time‑based deletion (default 7 days) and size‑based deletion (default 1 GB). Compaction retains only the latest value for each key.

Indexes

Kafka maintains a sparse offset index and a timestamp index to locate messages quickly without scanning entire logs.

Zero‑Copy Transfer

Kafka uses zero‑copy to move data from disk to network directly in kernel space, reducing CPU overhead and latency.

Reliability and Replication

Each partition has a set of replicas (AR). The in‑sync replica set (ISR) contains replicas that have caught up to the leader. Out‑of‑sync replicas (OSR) are lagging. The leader’s Log End Offset (LEO) marks the next write position; the High Watermark (HW) is the smallest LEO among ISR replicas, indicating the offset up to which all ISR replicas have persisted data and can be safely consumed.

Leader Epoch

Leader epoch is a monotonically increasing version number for the leader. Followers include the epoch in sync requests, preventing log truncation after leader changes and avoiding data loss.

This concise guide introduces Kafka’s essential concepts, architecture, storage layers, producer and consumer mechanics, configuration knobs, and reliability guarantees, providing a solid foundation for deeper exploration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Distributed Systems message queues Kafka Reliability Data Streaming

Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.