Comprehensive Kafka Study Guide: Core Concepts, Architecture, and Interview Questions
This article compiles essential Kafka fundamentals, architectural details, and a thorough set of interview questions ranging from basic to advanced topics, providing a concise yet complete resource for developers and engineers looking to master this distributed messaging platform.
Kafka Overview
Kafka is a widely used message‑queue middleware that offers decoupling, asynchronous processing, and traffic shaping (rate limiting/peak shaving). Like traditional messaging systems, it provides system decoupling, redundant storage, buffering, asynchronous communication, scalability, and recoverability, while also guaranteeing message ordering and supporting replay consumption.
Study Resources
Two primary resources are highlighted:
"Kafka必知必会" – a 34‑page illustrated guide that is easy to read.
Kafka interview questions – a collection of common interview topics for operations and development teams.
Book Table of Contents
Why use a messaging system?
Kafka basics
Kafka architecture
Why Kafka performs so well
Kafka data reliability
Interview Questions – Basic
What are Kafka's use cases and typical scenarios?
What do ISR and AR represent, and what does ISR expansion mean?
How does Kafka ensure message ordering?
Explain the role and processing order of partitioners, serializers, and interceptors.
Describe the overall structure of the Kafka producer client.
How many threads does the producer client use, and what are they?
What design flaws exist in the older Scala consumer client?
Is the statement "if consumer count exceeds topic partitions, some consumers get no data" correct? What situations cause duplicate consumption?
How to achieve multi‑threaded consumption given that KafkaConsumer is not thread‑safe?
Summarize the relationship between consumers and consumer groups.
What happens internally after creating or deleting a topic with kafka‑topics.sh?
Can a topic’s partition count be increased or decreased? If so, how; if not, why?
How to choose an appropriate number of partitions when creating a topic?
Interview Questions – Advanced
What internal topics does Kafka maintain and what are their purposes?
What is a preferred replica and what special role does it play?
Where does partition assignment occur in Kafka and what is the general process?
Describe Kafka’s log directory structure and index files.
How does Kafka locate a message when a specific offset is provided?
How does Kafka locate a message when a specific timestamp is provided?
Explain Log Retention and Log Compaction concepts.
Discuss the underlying storage mechanisms of Kafka.
Explain the principles behind Kafka’s delayed operations.
What is the role of the Kafka controller?
What are the design shortcomings of the older Scala consumer client?
What is the principle of consumer rebalancing (consumer coordinator and group coordinator)?
How are high‑watermark (HW) and log end offset (LEO) evolved across replicas?
Interview Questions – Expert
How are transactions implemented in Kafka?
What are failed replicas and how are they handled?
How does Kafka achieve reliability improvements such as HW and LeaderEpoch?
Why does Kafka not support read‑write separation?
How is a delayed queue implemented in Kafka?
How to implement dead‑letter and retry queues?
How does Kafka perform message auditing?
How does Kafka trace message flow?
How is consumer lag calculated, and which metrics are most important for monitoring?
What design choices give Kafka its high performance?
All material is compiled from publicly available sources for free knowledge sharing.
Linux Cloud Computing Practice
Welcome to Linux Cloud Computing Practice. We offer high-quality articles on Linux, cloud computing, DevOps, networking and related topics. Dive in and start your Linux cloud computing journey!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
