Big Data 15 min read

Comprehensive Kafka Interview Questions and Answers

This article compiles essential Kafka interview topics, covering cluster sizing, partition and replica configuration, offset management, topic creation, log structure, election mechanisms, partition assignment strategies, handling data backlog, exactly‑once semantics, idempotence, transactions, and performance tuning with practical command examples.

Big Data Technology Architecture

Jun 3, 2020

Comprehensive Kafka Interview Questions and Answers

1. Basic Assessment

1.1 What is the total disk size of your Kafka cluster, how many machines, how long are logs retained, and what monitoring tools are used? The answer should demonstrate real‑world deployment knowledge; typical calculations use daily data volume, 70% disk usage, and retention days to size disks, and the number of brokers is estimated by 2 × (peak production speed × replication factor / 100) + 1. Common monitors include custom solutions, CDH, KafkaEagle, KafkaMonitor, and KafkaManager.

1.2 What are appropriate numbers for partitions, replicas, and topics? Partition count should not exceed the number of broker machines; a typical range is 3‑10 partitions per topic. Replication factor is usually 2‑3. Topic count depends on log types.

1.3 What do HW, LEO, ISR, and AR mean? LEO is the last offset of each replica, HW is the smallest offset among all replicas, ISR is the set of in‑sync replicas, and AR is the set of all replicas for a partition.

1.4 Are Kafka messages ordered and how is ordering achieved? Ordering is guaranteed only within a single partition because each message has a monotonically increasing offset.

1.5 Can the number of partitions for a topic be increased or decreased? Partitions can only be increased; decreasing would lead to data loss.

Example command to increase partitions:

bin/kafka-topics.sh --zookeeper localhost:2181/kafka --alter --topic topic-config --partitions 3

1.6 How does Kafka maintain offsets? Offsets are stored in Zookeeper before version 0.9 and in the internal __consumer_offsets topic from 0.9 onward. Consumers commit offset+1, not the offset itself.

1.7 How do you load‑test Kafka? Use the built‑in scripts kafka-consumer-perf-test.sh and kafka-producer-perf-test.sh to identify bottlenecks, typically network I/O.

2. Deeper Assessment

2.1 What logic is executed when creating or deleting a topic? Example creation command:

bin/kafka-topics.sh --zookeeper node:2181 --create \

--replication-factor 3 --partitions 1 --topic csdn

The process involves (1) creating a Zookeeper node under /brokers/topics, (2) triggering the Controller listener, and (3) the Controller updating metadata and completing the topic creation.

2.2 What is the Kafka log directory structure? Each partition is a directory containing segment files ( .log, .index, .timeindex). The directory name follows topic‑partitionId (e.g., csdn-0).

2.3 Does Kafka use elections and what strategies are applied? Elections are used for partition leaders (ISR‑based) and for the Controller (first‑come‑first‑served).

2.4 What is ISR? ISR (In‑Sync Replicas) includes the leader and followers that are up‑to‑date; replicas lagging beyond configured thresholds are removed from ISR and placed into OSR.

2.5 What are Kafka’s partition assignment strategies? The default is Range , with alternatives RoundRobin and Sticky (introduced in 0.11). Range assigns partitions to consumers based on sorted order.

2.6 How to handle message backlog? Either increase topic partitions and consumer count, or increase the fetch batch size if downstream processing is slow.

2.7 How does Kafka achieve exactly‑once semantics? Since version 0.11, enabling idempotence on the producer ( enable.idempotence=true) together with acks=-1 provides exactly‑once delivery.

2.8 What is Kafka idempotence? It guarantees that a producer will not duplicate messages within a single session; it does not extend across restarts or multiple partitions.

2.9 What do you know about Kafka transactions? Introduced in 0.11, transactions allow atomic writes across partitions. A Transaction Coordinator manages transaction state in an internal topic, enabling either full commit or abort.

2.10 How does Kafka achieve high read/write throughput? By using a distributed cluster, sequential disk writes, and zero‑copy techniques. Sequential writes can reach ~600 MB/s versus ~100 KB/s for random writes.

3. Discussion

3.1 Common broker parameter optimizations

Network and I/O threads:

# Max network threads (default 3)
num.network.threads=cpu_cores+1

# Disk I/O threads
num.io.threads=cpu_cores*2

Log file policies:

# Flush data to disk every second
log.flush.interval.ms=1000

Retention policy:

# Retain logs for three days
log.retention.hours=72

Replica settings:

# Default replication factor for new topics
offsets.topic.replication.factor=3

3.2 Producer optimizations

Buffer memory and compression:

# Buffer memory (32 MB)
buffer.memory=33554432

# Compression type (none by default)
compression.type=none

These settings help reduce network pressure and improve broker storage efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Performance Kafka messaging interview

Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.