Big Data 12 min read

Kafka Interview Questions: High Availability, Reliability, Consistency, Performance, and Usage Rationale

This article explains common Kafka interview questions by analyzing the system's high‑availability design, reliability mechanisms, consistency model, performance tricks such as sequential writes and zero‑copy, and the reasons for using Kafka and message queues, providing both conceptual insight and practical details.

Full-Stack Internet Architecture

Jun 18, 2020

Kafka Interview Questions: High Availability, Reliability, Consistency, Performance, and Usage Rationale

Preface

Before writing this article the author wondered whether interview questions should only list questions and answers or also analyze the underlying principles. After discussion the author decided to adopt the latter approach, believing that understanding both the question and its underlying mechanisms is essential, and will use several articles to elaborate each topic.

Article Overview

How does Kafka guarantee data reliability and consistency?

Why is Kafka so fast?

Will messages be lost or consumed repeatedly?

Why use Kafka and message queues?

Why doesn’t Kafka support read‑write separation?

How does Kafka ensure high availability, reliability, and consistency?

High Availability

Kafka is a distributed system that uses ZooKeeper to store metadata, which improves availability.

Kafka employs a multi‑replica mechanism; if the leader broker fails, a replica is elected as the new leader, allowing continued read/write service. In newer versions replicas can also serve read requests, further enhancing fault tolerance.

Reliability

From the producer side, reliability means messages are stored in a partition without loss. Kafka uses the configuration parameters request.required.acks and min.insync.replicas together to achieve this.

request.required.acks can be set to 1, 0, or -1.

When request.required.acks=1, the producer considers the send successful as soon as the leader partition persists the message; if the leader crashes before ISR replicas sync, the message may be lost.

When request.required.acks=0, the producer assumes success immediately after sending, so loss is possible.

When request.required.acks=-1, the send succeeds only after all ISR replicas have synced. If only a single partition exists, a broker failure can still cause loss, so min.insync.replicas should be set to ≥2; otherwise the producer will receive an error.

Even with these settings, if a message is partially replicated and the leader crashes, the message may be retransmitted, leading to duplicate storage. Newer Kafka versions introduce idempotence, assigning a unique ID to each message to avoid duplicate writes.

Consistency

From the consumer side, Kafka guarantees that a message read from different partitions is consistent by using the High Watermark (HW) concept.

HW ensures that consumers can only read messages that have been replicated to all in‑sync replicas, effectively limiting consumption to the shortest‑lagging replica (the “barrel‑the‑shortest‑board” effect). The parameter replica.lag.time.max.ms can be tuned to bound the maximum lag time.

Why is Kafka so fast?

Kafka achieves high throughput through sequential writes, zero‑copy I/O, and compression.

Sequential Writes

Sequential disk writes avoid costly seek operations required for random writes, dramatically improving write speed.

Zero‑Copy

Zero‑copy eliminates the need to copy data between kernel and user space. Data moves directly from disk to the network socket within the kernel, reducing two copy operations and boosting transfer speed.

Compression

Kafka supports multiple compression algorithms (gzip, snappy, lz4, etc.) to reduce the amount of data transmitted.

Will messages be lost or consumed repeatedly?

Message loss has been addressed in the previous sections.

In older versions (e.g., Kafka 0.8) without idempotence, duplicate messages could be stored, leading to repeated consumption.

One practical solution to duplicate consumption is to record consumed keys in Redis with an expiration time and discard messages with already‑seen keys.

Why use Kafka and why use a message queue?

Reasons to use a message queue

Message queues provide decoupling, asynchronous processing, and traffic shaping (peak‑shaving).

When multiple downstream systems depend on an upstream service, adding a new consumer only requires integration with the queue, avoiding direct coupling.

Introducing a queue also isolates failures: if the upstream service spikes, the queue buffers messages, preventing downstream services from crashing.

However, queues add system complexity, increase maintenance cost, and introduce consistency challenges.

Why does Kafka not support read‑write separation?

The discussion focuses on Kafka 0.9. In newer versions partitions can serve reads, but historically read‑write separation was avoided because:

It would increase system design complexity.

Reading from replicas introduces replication lag, causing stale data and consistency issues, especially for latency‑sensitive scenarios.

Therefore, the cost outweighs the benefit for workloads with heavy writes and light reads.

Conclusion

This article covered several common Kafka interview questions. The next article will analyze delay queues and their practical use cases. Stay tuned.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

distributed systems performance High Availability kafka Reliability consistency

Written by

Full-Stack Internet Architecture

Introducing full-stack Internet architecture technologies centered on Java

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.