Kafka Interview Questions: High Availability, Reliability, Consistency, Performance, and Usage Rationale
This article explains common Kafka interview questions by analyzing the system's high‑availability design, reliability mechanisms, consistency model, performance tricks such as sequential writes and zero‑copy, and the reasons for using Kafka and message queues, providing both conceptual insight and practical details.
Preface
Before writing this article the author wondered whether interview questions should only list questions and answers or also analyze the underlying principles. After discussion the author decided to adopt the latter approach, believing that understanding both the question and its underlying mechanisms is essential, and will use several articles to elaborate each topic.
Article Overview
How does Kafka guarantee data reliability and consistency?
Why is Kafka so fast?
Will messages be lost or consumed repeatedly?
Why use Kafka and message queues?
Why doesn’t Kafka support read‑write separation?
How does Kafka ensure high availability, reliability, and consistency?
High Availability
Kafka is a distributed system that uses ZooKeeper to store metadata, which improves availability.
Kafka employs a multi‑replica mechanism; if the leader broker fails, a replica is elected as the new leader, allowing continued read/write service. In newer versions replicas can also serve read requests, further enhancing fault tolerance.
Reliability
From the producer side, reliability means messages are stored in a partition without loss. Kafka uses the configuration parameters request.required.acks and min.insync.replicas together to achieve this.
request.required.acks can be set to 1, 0, or -1.
When request.required.acks=1, the producer considers the send successful as soon as the leader partition persists the message; if the leader crashes before ISR replicas sync, the message may be lost.
When request.required.acks=0, the producer assumes success immediately after sending, so loss is possible.
When request.required.acks=-1, the send succeeds only after all ISR replicas have synced. If only a single partition exists, a broker failure can still cause loss, so min.insync.replicas should be set to ≥2; otherwise the producer will receive an error.
Even with these settings, if a message is partially replicated and the leader crashes, the message may be retransmitted, leading to duplicate storage. Newer Kafka versions introduce idempotence, assigning a unique ID to each message to avoid duplicate writes.
Consistency
From the consumer side, Kafka guarantees that a message read from different partitions is consistent by using the High Watermark (HW) concept.
HW ensures that consumers can only read messages that have been replicated to all in‑sync replicas, effectively limiting consumption to the shortest‑lagging replica (the “barrel‑the‑shortest‑board” effect). The parameter replica.lag.time.max.ms can be tuned to bound the maximum lag time.
Why is Kafka so fast?
Kafka achieves high throughput through sequential writes, zero‑copy I/O, and compression.
Sequential Writes
Sequential disk writes avoid costly seek operations required for random writes, dramatically improving write speed.
Zero‑Copy
Zero‑copy eliminates the need to copy data between kernel and user space. Data moves directly from disk to the network socket within the kernel, reducing two copy operations and boosting transfer speed.
Compression
Kafka supports multiple compression algorithms (gzip, snappy, lz4, etc.) to reduce the amount of data transmitted.
Will messages be lost or consumed repeatedly?
Message loss has been addressed in the previous sections.
In older versions (e.g., Kafka 0.8) without idempotence, duplicate messages could be stored, leading to repeated consumption.
One practical solution to duplicate consumption is to record consumed keys in Redis with an expiration time and discard messages with already‑seen keys.
Why use Kafka and why use a message queue?
Reasons to use a message queue
Message queues provide decoupling, asynchronous processing, and traffic shaping (peak‑shaving).
When multiple downstream systems depend on an upstream service, adding a new consumer only requires integration with the queue, avoiding direct coupling.
Introducing a queue also isolates failures: if the upstream service spikes, the queue buffers messages, preventing downstream services from crashing.
However, queues add system complexity, increase maintenance cost, and introduce consistency challenges.
Why does Kafka not support read‑write separation?
The discussion focuses on Kafka 0.9. In newer versions partitions can serve reads, but historically read‑write separation was avoided because:
It would increase system design complexity.
Reading from replicas introduces replication lag, causing stale data and consistency issues, especially for latency‑sensitive scenarios.
Therefore, the cost outweighs the benefit for workloads with heavy writes and light reads.
Conclusion
This article covered several common Kafka interview questions. The next article will analyze delay queues and their practical use cases. Stay tuned.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Full-Stack Internet Architecture
Introducing full-stack Internet architecture technologies centered on Java
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
