Backend Development 22 min read

Introduction to Message Systems and Kafka Architecture

This article explains the purpose of message systems, compares various solutions such as RabbitMQ, Redis, ZeroMQ, ActiveMQ, RocketMQ and Kafka, then details Kafka's design goals, core concepts, architecture, replication, retention policies, zero‑copy transfer, batching, and performance optimizations for high‑throughput distributed messaging.

Top Architect

Jan 22, 2021

Introduction to Message Systems and Kafka Architecture

1 Message System Overview

Why use a message system? It provides decoupling, redundancy, flexibility under load spikes, recoverability, ordering guarantees, and asynchronous communication, allowing systems to exchange data without knowing each other's existence.

Common message systems include RabbitMQ (Erlang, AMQP), Redis (lightweight queue), ZeroMQ (library‑level P2P), ActiveMQ (JMS), RocketMQ (Java, pub/sub), and Kafka (high‑performance, distributed, persistent).

Kafka design goals are high throughput (up to 1 M msgs/s per broker), message persistence, and full distribution with horizontal scalability for producers, brokers, and consumers.

2 Kafka Introduction and Architecture

2.1 Kafka Architecture

Kafka consists of producers, a Kafka cluster (brokers), and consumers. The cluster is coordinated by ZooKeeper (not shown in the diagram).

2.2 Core Concepts

(1) Message

The basic data unit in Kafka is represented by

public class ProducerRecord<K, V> { private final String topic; private final Integer partition; private final Headers headers; private final K key; private final V value; private final Long timestamp; // ... }

. The key determines the partition for ordering.

(2) Topic, Partition & Log

A Topic is a logical collection of messages; it can have multiple partitions. Each partition is an ordered log identified by an offset. Offsets guarantee ordering within a partition, but not across partitions.

Partitions are the basis for Kafka's horizontal scalability; they are distributed across brokers.

(3) Broker

A broker receives messages from producers, assigns offsets, stores them on disk, and serves consumer requests.

(4) Producer

Producers send messages to topics, selecting partitions based on the key's hash or round‑robin.

(5) Consumer

Consumers pull messages from topics and track their own offset per partition.

(6) Consumer Group

Multiple consumers can form a consumer group; each partition is consumed by only one member of the group, enabling both exclusive and broadcast consumption patterns and providing horizontal scaling and fail‑over.

(7) Replication

Each partition has one leader replica and multiple follower replicas. The leader handles all reads/writes; followers replicate the leader's log. If the leader fails, an in‑sync follower is elected.

(8) Retention & Log Compaction

Kafka deletes old data based on time or size limits, and can compact logs to keep only the latest value for each key.

(9) Cluster & Controller

The controller (a broker elected via ZooKeeper) manages partition and replica state.

(10) ISR (In‑Sync Replica) Set

ISR contains replicas that are up‑to‑date with the leader; lagging replicas are removed from ISR to avoid slowing the cluster.

(11) HW & LEO

HW (high‑watermark) marks the offset up to which all ISR replicas have replicated; LEO (log end offset) is the last offset in a replica's log.

2.3 ZooKeeper’s Role in Kafka

ZooKeeper stores broker registrations, topic‑partition metadata, consumer group membership, and offsets, enabling dynamic load balancing and fail‑over.

2.4 Reasons for Kafka’s High Performance

(1) Efficient Disk Usage

Partitions are append‑only logs, avoiding random writes; segments are deleted as whole files, and page cache is heavily utilized. Multiple disks can be configured via log.dirs for parallel I/O.

(2) Zero‑Copy Transfer

Kafka uses Linux sendfile (or Java NIO transferTo/transferFrom) to move data from disk to network without copying between user and kernel buffers.

public long transferFrom(FileChannel fileChannel, long position, long count) throws IOException { return fileChannel.transferTo(position, count, socketChannel); }

(3) Reduced Network Overhead

Batching combines many records into a single request, decreasing protocol overhead. Compression (e.g., gzip, Snappy) further reduces payload size, and compressed data is stored on disk without decompression.

(4) Efficient Serialization

Custom serializers (Avro, Protobuf) produce compact binary formats, improving throughput when combined with compression.

References: https://www.jianshu.com/p/a036405f989c, https://www.jianshu.com/p/eb75372df00a

Note: The article contains promotional calls to action (e.g., reply with keywords for gifts) which are not part of the technical content.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Java architecture Message Queue Zero‑copy

Written by

Top Architect

Top Architect focuses on sharing practical architecture knowledge, covering enterprise, system, website, large‑scale distributed, and high‑availability architectures, plus architecture adjustments using internet technologies. We welcome idea‑driven, sharing‑oriented architects to exchange and learn together.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.