Introduction to Message Systems and Kafka Architecture
This article explains the purpose of message systems, compares various solutions such as RabbitMQ, Redis, ZeroMQ, ActiveMQ, RocketMQ and Kafka, then details Kafka's design goals, core concepts, architecture, replication, retention policies, zero‑copy transfer, batching, and performance optimizations for high‑throughput distributed messaging.
1 Message System Overview
Why use a message system? It provides decoupling, redundancy, flexibility under load spikes, recoverability, ordering guarantees, and asynchronous communication, allowing systems to exchange data without knowing each other's existence.
Common message systems include RabbitMQ (Erlang, AMQP), Redis (lightweight queue), ZeroMQ (library‑level P2P), ActiveMQ (JMS), RocketMQ (Java, pub/sub), and Kafka (high‑performance, distributed, persistent).
Kafka design goals are high throughput (up to 1 M msgs/s per broker), message persistence, and full distribution with horizontal scalability for producers, brokers, and consumers.
2 Kafka Introduction and Architecture
2.1 Kafka Architecture
Kafka consists of producers, a Kafka cluster (brokers), and consumers. The cluster is coordinated by ZooKeeper (not shown in the diagram).
2.2 Core Concepts
(1) Message
The basic data unit in Kafka is represented by
public class ProducerRecord<K, V> { private final String topic; private final Integer partition; private final Headers headers; private final K key; private final V value; private final Long timestamp; // ... }. The key determines the partition for ordering.
(2) Topic, Partition & Log
A Topic is a logical collection of messages; it can have multiple partitions. Each partition is an ordered log identified by an offset. Offsets guarantee ordering within a partition, but not across partitions.
Partitions are the basis for Kafka's horizontal scalability; they are distributed across brokers.
(3) Broker
A broker receives messages from producers, assigns offsets, stores them on disk, and serves consumer requests.
(4) Producer
Producers send messages to topics, selecting partitions based on the key's hash or round‑robin.
(5) Consumer
Consumers pull messages from topics and track their own offset per partition.
(6) Consumer Group
Multiple consumers can form a consumer group; each partition is consumed by only one member of the group, enabling both exclusive and broadcast consumption patterns and providing horizontal scaling and fail‑over.
(7) Replication
Each partition has one leader replica and multiple follower replicas. The leader handles all reads/writes; followers replicate the leader's log. If the leader fails, an in‑sync follower is elected.
(8) Retention & Log Compaction
Kafka deletes old data based on time or size limits, and can compact logs to keep only the latest value for each key.
(9) Cluster & Controller
The controller (a broker elected via ZooKeeper) manages partition and replica state.
(10) ISR (In‑Sync Replica) Set
ISR contains replicas that are up‑to‑date with the leader; lagging replicas are removed from ISR to avoid slowing the cluster.
(11) HW & LEO
HW (high‑watermark) marks the offset up to which all ISR replicas have replicated; LEO (log end offset) is the last offset in a replica's log.
2.3 ZooKeeper’s Role in Kafka
ZooKeeper stores broker registrations, topic‑partition metadata, consumer group membership, and offsets, enabling dynamic load balancing and fail‑over.
2.4 Reasons for Kafka’s High Performance
(1) Efficient Disk Usage
Partitions are append‑only logs, avoiding random writes; segments are deleted as whole files, and page cache is heavily utilized. Multiple disks can be configured via log.dirs for parallel I/O.
(2) Zero‑Copy Transfer
Kafka uses Linux sendfile (or Java NIO transferTo/transferFrom) to move data from disk to network without copying between user and kernel buffers.
public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException { return fileChannel.transferTo(position, count, socketChannel); }(3) Reduced Network Overhead
Batching combines many records into a single request, decreasing protocol overhead. Compression (e.g., gzip, Snappy) further reduces payload size, and compressed data is stored on disk without decompression.
(4) Efficient Serialization
Custom serializers (Avro, Protobuf) produce compact binary formats, improving throughput when combined with compression.
References: https://www.jianshu.com/p/a036405f989c, https://www.jianshu.com/p/eb75372df00a
Note: The article contains promotional calls to action (e.g., reply with keywords for gifts) which are not part of the technical content.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Top Architect
Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
