Kafka Concept Overview
This article provides a comprehensive introduction to Kafka, covering its definition, message‑queue models, architecture components, installation steps, configuration details, producer and consumer mechanisms, reliability guarantees, partition assignment strategies, offset management, and high‑performance read/write techniques.
Kafka Concept Overview
1.1 Definition
Kafka is a distributed, publish/subscribe‑based message queue primarily used for real‑time processing in big‑data scenarios.
1.2 Message Queue
1.2.1 Traditional vs. Modern Queue Models
Traditional queues process downstream steps (e.g., sending an SMS) after the main transaction completes, while modern queues allow immediate response by decoupling subsequent processing.
1.2.2 Benefits of Using a Message Queue
A. Decoupling B. Recoverability C. Buffering D. Flexibility & peak handling E. Asynchronous communication
1.2.3 Queue Patterns
A. Point‑to‑point: each message is consumed by a single consumer. B. Publish/Subscribe: messages are delivered to all subscribed consumers; Kafka follows this model with topics and supports both pull (consumer‑initiated) and push (producer‑initiated) delivery.
1.3 Kafka Basic Architecture
The core components are brokers, producers, consumer groups, and ZooKeeper (for coordination).
Producers send messages; brokers buffer messages and host topics, each divided into partitions and replicas.
Consumer groups read messages; a group’s consumers cannot read the same partition simultaneously, which improves parallel consumption. The number of consumers must be less than or equal to the number of partitions.
Offsets are stored in ZooKeeper before version 0.9 and in an internal Kafka topic thereafter.
1.4 Kafka Installation
A. Install by extracting the tar package:
tar -zxvf kafka_2.11-2.1.1.tgz -C /usr/local/B. View configuration files:
cd /usr/local/kafka/config
ls -lC. Edit server.properties to set broker.id , data directories, topic deletion policy, log retention time, log file size, ZooKeeper connection, and default partition count.
1.5 Starting Kafka
A. Start each broker manually (blocking mode).
B. Recommended daemon mode start.
1.6 Kafka Operations
A. List existing topics (via ZooKeeper).
B. Create a topic with specified partitions and replication factor.
C. Delete a topic.
D. View topic details.
2. Kafka Architecture Deep Dive
Kafka guarantees ordering only within a partition, not across partitions.
2.1 Workflow
Producers write to topics; consumers read from topics. Each partition has its own log file and offset; consumers track the offset they have processed.
2.2 Internals
Each partition is split into segments, each consisting of an index file and a log file. The index maps offsets to physical positions, enabling fast seeks.
3. Producers and Consumers
3.1 Producers
Partitions improve concurrency. Producers can specify a partition or use round‑robin distribution.
3.2 Reliability (acks)
Three ack levels:
A. acks=0 – fire‑and‑forget (high loss risk). B. acks=1 – leader writes to disk before ack (possible loss if leader fails). C. acks=-1 (all) – all in‑sync replicas (ISR) write to disk before ack (higher durability, possible duplicates).
3.3 Consumer Consistency (HW)
HW (high water mark) is the smallest LEO among ISR, defining the maximum offset visible to consumers, preventing data loss after leader failure.
3.4 Consumers
3.4.1 Consumption Model
Kafka uses pull‑based consumption, allowing consumers to control read speed.
3.4.2 Partition Assignment
Two strategies:
• RoundRobin – works when all consumers in a group subscribe to the same set of topics. • Range – default; assigns contiguous partitions per topic, which may lead to imbalance.
3.4.3 Offset Management
Offsets are stored either in ZooKeeper (legacy) or in a dedicated Kafka topic.
3.4.5 Consumer Group Example
Changing consumer‑group IDs, starting multiple consumers, and observing how each consumer receives distinct messages within the same group.
4. High‑Performance Read/Write Mechanisms
4.1 Distributed Deployment
Multiple nodes operate in parallel.
4.2 Sequential Disk Writes
Producers append to log files sequentially, achieving high throughput (≈600 MB/s) compared to random writes.
4.3 Zero‑Copy
Kafka transfers data directly between kernel buffers, avoiding user‑space copies and boosting performance.
5. Role of ZooKeeper in Kafka
ZooKeeper elects a controller broker that manages broker membership, partition‑replica allocation, and leader election.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.