Understanding Kafka: Use Cases, Reliability, Storage, Replication, Consumer Assignment, Transactions, and Exactly-Once Semantics
This article explains why Kafka is used, its buffering, decoupling, redundancy and robustness benefits, details the ack reliability levels, storage design, replica synchronization, ISR handling, consumer partition assignment strategies, transaction support, exactly‑once semantics, and why read‑write separation is not provided.
Why use Kafka?
Buffering and peak‑shaving: Kafka can absorb bursty upstream traffic and let downstream services process at their own pace.
Decoupling and scalability: It acts as an interface layer that separates business logic from data flow, enabling easy expansion.
Redundancy: A single producer can publish to a topic that many independent services consume.
Robustness: Messages can accumulate in Kafka, so a temporary consumer failure does not affect the main workflow.
Asynchronous communication: Producers can fire‑and‑forget messages, letting consumers handle them later.
How does Kafka guarantee data reliability?
Producers receive an acknowledgement (ack) from each partition after the broker persists the record. The producer proceeds only after receiving the ack; otherwise it retries.
Ack response levels
0: Producer does not wait for any ack – lowest latency but possible data loss.
1: Leader writes to its log and acknowledges – loss possible if leader fails before followers sync.
-1 (all): Leader and all in‑sync replicas (ISR) write to disk before ack – may duplicate data if leader fails after followers ack.
Is Kafka data stored on disk or in memory, and why is it fast?
Kafka uses disk storage.
Speed comes from sequential writes, which avoid costly random I/O on mechanical drives.
Memory‑mapped files let the OS map large files directly into memory, synchronising changes to disk automatically.
Kafka splits large partition files into small segments, builds sparse index files, and maps index metadata into memory, enabling fast look‑ups and low‑overhead scans.
Replica synchronization strategy
The chosen strategy reduces replica count (n+1 instead of 2n+1) while tolerating n node failures, accepting slightly higher network latency.
In‑Sync Replica (ISR) handling
ISR is the set of followers that are fully caught up with the leader. If a follower lags beyond replica.lag.time.max.ms, it is removed from ISR. After a leader failure, a new leader is elected from the remaining ISR.
LEO and HW
LEO (Log End Offset) is the highest offset of a replica. HW (High Watermark) is the highest offset that all ISR members have replicated.
Consumer partition assignment strategies
Range: Sort partitions and consumers, then allocate contiguous blocks; extra partitions go to the first consumers.
RoundRobin: Sort all topic‑partitions and consumers, then assign in a round‑robin fashion. Requires equal num.streams per consumer and identical subscription sets.
StickyAssignor (since 0.11): Tries to keep previous assignments (stickiness) while balancing load; if conflict arises, balance takes priority.
Examples illustrate how each strategy distributes partitions across consumers and how they behave during a rebalance when a consumer leaves.
How Kafka implements transactions
From version 0.11, Kafka supports transactions. A globally unique Transaction ID is bound to a producer ID (PID). The Transaction Coordinator tracks transaction state in an internal topic, enabling exactly‑once semantics across partitions and sessions.
Producer transactions rely on the coordinator; consumer transactions have weaker guarantees because offsets can be read arbitrarily and segment lifetimes differ.
Exactly‑Once semantics
ACK=-1 gives At‑Least‑Once (no data loss, possible duplicates).
ACK=0 gives At‑Most‑Once (no duplicates, possible loss).
Enabling idempotence ( enable.idempotence=true) plus At‑Least‑Once yields Exactly‑Once.
Idempotent producers attach a sequence number per (PID, Partition); brokers deduplicate based on this triple.
Why Kafka does not support read‑write separation
Both producers and consumers interact with the leader replica, forming a “write‑and‑read‑from‑leader” model. Separate read replicas would introduce consistency windows and additional latency due to extra network‑disk hops, which is unsuitable for low‑latency streaming workloads.
Why Kafka is fast despite using disk
Sequential I/O avoids random‑seek penalties.
Memory‑mapped files let the OS handle paging efficiently.
Segmented storage and sparse indexes reduce disk usage and accelerate look‑ups.
Kafka improves query efficiency by segmenting data files and using binary search on offsets; each segment has a matching .index file for fast location.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
