Kafka High‑Reliability Architecture, Storage Mechanisms, and Performance Benchmark
This article explains Kafka's distributed architecture, its topic‑partition storage model, replication and ISR mechanisms, leader election, delivery guarantees, configuration for high reliability, and presents extensive benchmark results showing how replication factor, acks settings, and partition count affect throughput and latency.
1. Overview
Kafka, originally developed by LinkedIn and now part of Apache, is a Scala‑based distributed messaging system known for horizontal scalability and high throughput. Compared with traditional messaging systems, Kafka is designed as a distributed system, provides high publish/subscribe throughput, supports multiple subscribers with automatic consumer rebalancing, and persists messages to disk for batch consumption such as ETL and real‑time applications.
2. Kafka Architecture
A typical Kafka deployment consists of multiple producers, brokers, consumer groups, and a ZooKeeper ensemble. Producers push messages to brokers, consumers pull messages from brokers, and ZooKeeper manages cluster metadata, leader election, and consumer‑group rebalancing.
Name
Explanation
Broker
A Kafka node that stores messages; one or more brokers form a Kafka cluster.
Topic
Logical category for messages; each message must specify a topic.
Producer
Client that sends messages to a broker.
Consumer
Client that reads messages from a broker.
ConsumerGroup
A set of consumers sharing a subscription; only one consumer in a group processes a given message.
Partition
Physical subdivision of a topic; each partition is an ordered log.
2.1 Topic & Partition
Each topic is split into multiple partitions, which are append‑only log files. Messages are appended to the tail of a partition and identified by a long offset. Proper partitioning distributes load evenly across brokers, improving horizontal scalability.
# The default number of log partitions per topic. More partitions allow greater parallelism for consumption, but also increase the number of files across brokers.
num.partitions=33. High‑Reliability Storage Analysis
3.1 Kafka File Storage Mechanism
Messages are stored per‑partition in a directory structure under log.dirs . Each partition contains multiple segment files; each segment consists of an .index file (metadata) and a .log file (actual messages). Offsets are used to locate messages within segments.
00000000000000000000.index
00000000000000000000.log
00000000000000170410.index
00000000000000170410.log
00000000000000239430.index
00000000000000239430.log3.2 Replication Principle and Synchronization
Each partition has N replicas (default N≥1). One replica is the leader; the others are followers that replicate the leader's log. The leader writes messages and updates the Log End Offset (LEO). The High Watermark (HW) is the smallest LEO among ISR members and determines the point up to which consumers can read.
3.3 In‑Sync Replicas (ISR)
ISR is the subset of replicas that are fully caught up with the leader. Parameters such as replica.lag.time.max.ms control when a follower is removed from ISR. The size of ISR influences both reliability and throughput.
3.4 Data Reliability and Persistence Guarantees
Producers control reliability via request.required.acks :
1 (default): Ack after leader writes; data may be lost if leader crashes.
0: No ack; highest throughput, lowest reliability.
-1 (all): Ack after all ISR replicas write; highest reliability.
When acks=-1 , min.insync.replicas should be set (e.g., ≥2) to ensure a minimum number of replicas acknowledge writes.
3.5 High Watermark (HW) Details
HW is the minimum LEO among ISR members. It prevents data loss when a leader fails and a new leader is elected, by ensuring the new leader’s log is truncated to HW before serving reads.
3.6 Leader Election
Kafka uses a custom leader election algorithm similar to Microsoft’s PacificA. Only replicas in ISR are eligible to become leader (unless unclean.leader.election.enable=true ), balancing availability and consistency.
3.7 Producer Sending Modes
The producer.type parameter selects sync (default) or async sending. Async mode batches messages for higher throughput but increases the risk of data loss. Related async parameters include queue.buffering.max.ms , queue.buffering.max.messages , queue.enqueue.timeout.ms , and batch.num.messages .
4. High‑Reliability Usage Analysis
4.1 Delivery Guarantees
Kafka can provide three guarantees:
At‑most‑once: messages may be lost but never duplicated.
At‑least‑once: messages are never lost but may be duplicated.
Exactly‑once: each message is processed exactly once (requires additional deduplication logic).
4.2 Message Deduplication
Kafka itself does not provide built‑in deduplication. Applications can use globally unique identifiers (GUID) or external systems like Redis to achieve idempotence.
4.3 High‑Reliability Configuration Recommendations
Topic: replication.factor≥3 , 2≤min.insync.replicas≤replication.factor
Broker: set unclean.leader.election.enable=false
Producer: request.required.acks=-1 and producer.type=sync
5. Benchmark
5.1 Test Environment
Four Kafka brokers (24 CPU cores, 62 GB RAM, 4 Gbps network, 1089 GB disk) running Kafka 0.10.1.0 with JVM options -Xmx8G -Xms8G … . Client machine: 24 CPU cores, 3 GB RAM, 1 Gbps network.
5.2 Scenario 1 – Impact of Replicas, min.insync.replicas , and acks on TPS
Single producer, sync mode, 1 KB messages, 12 partitions. Varying replicas (1/2/4), min.insync.replicas (1/2/4), and acks (-1/1/0). Results show:
TPS order: acks=0 > acks=1 > acks=-1 .
More replicas reduce TPS.
min.insync.replicas does not affect TPS when acks=-1 .
5.3 Scenario 2 – Fixed 1 Partition, Vary Replicas and min.insync.replicas
With acks=-1 , increasing replicas lowers TPS slightly; min.insync.replicas has negligible impact.
5.4 Scenario 3 – Fixed 1 Partition, Vary Replicas and acks
TPS decreases as replicas increase and as acks moves from 0 to 1 to -1.
5.5 Scenario 4 – Varying Partition Count
With 2 replicas, min.insync.replicas=2 , acks=-1 , increasing partitions improves TPS up to a point, after which TPS plateaus or slightly drops.
5.6 Scenario 5 – Broker Failures
Killing two brokers while producing with acks=-1 and min.insync.replicas=2 reduces TPS but continues operation; killing three brokers stops production until enough brokers recover. High retries leads to duplicate persisted messages.
5.7 Scenario 6 – Latency Measurements
With 12 partitions, 4 replicas, acks=-1 , producer latency avg ≈ 1.7 ms (max ≈ 157 ms); consumer latency avg ≈ 1.6 ms (max ≈ 288 ms).
5.8 Summary of Findings
When acks=-1 , TPS is limited by the number of ISR replicas; more replicas → lower TPS.
acks=0 yields the highest TPS, followed by acks=1 , then acks=-1 .
min.insync.replicas does not affect TPS.
Increasing partition count improves TPS up to a saturation point.
With acks=-1 and min.insync.replicas≥1 , all successfully acknowledged messages are safely persisted.
Author Information
The article is authored by the VMS (Message Middleware) team of Vipshop, part of the infrastructure department, focusing on research of messaging middleware such as RabbitMQ and Kafka.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.