Big Data 31 min read

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

This article comprehensively explores Kafka’s core architecture, explaining how asynchronous decoupling and traffic shaping are achieved, detailing the roles of producers, brokers, consumers, and ZooKeeper, and analyzing the reliability and performance techniques such as ACK policies, replication, idempotent and transactional producers, page‑cache flushing, zero‑copy, compression, batching, and load‑balancing strategies.

Big Data Technology Architecture

Apr 22, 2023

Deep Dive into Kafka’s High Reliability and High Performance Mechanisms

Before diving into Kafka’s core concepts, the article asks why one would use Kafka, highlighting two typical scenarios: asynchronous decoupling of producers and consumers, and traffic shaping (peak‑shaving and valley‑filling) in high‑throughput systems such as order processing and flash‑sale services.

Kafka Macro Overview – Kafka consists of Producers, Brokers, Consumers, and ZooKeeper for cluster management. The article describes each component’s responsibilities and introduces key concepts such as Topic, Partition, Segment, and Offset.

High Reliability – Reliability is ensured by three steps: (1) reliable delivery from Producer to Broker, (2) durable persistence on the Broker, and (3) reliable consumption by the Consumer. Three ACK strategies are explained (acks=0, acks=1, acks=-1) and the recommended configuration for strong durability (acks=-1, min.insync.replicas>2, unclean.leader.election.enable=false).

Message Sending Strategies – Kafka supports synchronous and asynchronous sending. The article shows the parameter table for the Async flag and explains how Sarama implements both modes with a main coroutine and a dispatcher coroutine. Asynchronous sending returns immediately after enqueuing the message, while synchronous sending wraps the asynchronous flow with a WaitGroup to provide blocking semantics.

Idempotent and Transactional Producers – Idempotent producers guarantee exactly‑once semantics per partition by assigning a unique PID and sequence number to each message. Transactional producers add a TransactionCoordinator to manage multi‑partition atomic writes, requiring transactional.id and consumer isolation.level=read_committed.

Broker Persistence – To achieve high throughput, brokers write incoming messages to the Linux PageCache and flush to disk asynchronously. The article illustrates the risk of data loss if a broker crashes before flushing and introduces the replica mechanism (Leader‑Follower replication, ISR, OSR) to mitigate single‑node failures.

Replica Mechanism, HW and LEO – The concepts of High Watermark (HW) and Log End Offset (LEO) are defined. The article walks through a complete HW/LEO update cycle, showing how followers fetch from the leader, update their LEO, and how HW is computed as the minimum LEO of in‑sync replicas.

KIP‑101 Data‑Loss and Data‑Corruption – Scenarios where follower logs diverge from the leader are examined, explaining why truncating logs based on HW can cause loss or corruption, and how Leader‑Epoch requests help recover consistent state after failures.

Consumer Offset Commit – Two offset‑commit modes are compared: automatic commits (enable.auto.commit=true) which risk message loss if processing fails, and manual commits (enable.auto.commit=false) which require explicit offset commits to achieve at‑least‑once delivery and may need idempotent processing.

High Performance Techniques – Kafka achieves low latency and high throughput through asynchronous sending, batch sending (controlled by batch.size and linger.ms), compression (configurable algorithms and levels), page‑cache sequential writes, zero‑copy I/O (mmap for writes, sendfile for reads), sparse indexing (offset and timestamp indexes with configurable interval), and a multi‑reactor, multi‑threaded network model (SocketServer and KafkaRequestHandlerPool).

Load Balancing – Producer load balancing relies on partitioners (default hash‑based for keyed messages, round‑robin for null keys) while consumer load balancing distributes partitions among group members using strategies such as Range, RoundRobin, and StickyAssignor.

Cluster Management – Kafka uses ZooKeeper to store metadata about brokers, topics, partitions, and consumer groups, handling leader election, broker registration, and consumer rebalancing.

References to several online articles and documentation are provided at the end of the piece.

distributed-systems Message Queue Reliability