RocketMQ, Kafka, Pulsar: Core Concepts, Architecture & Transactional Messaging
This article provides a comprehensive overview of major message‑queue middleware—including RocketMQ, Kafka, Pulsar, and RabbitMQ—covering fundamental concepts such as tags, groups, offsets, architectural components, storage mechanisms, transaction workflows, rebalance strategies, and recent developments, while comparing their features and performance characteristics.
Fundamental Concepts
Message‑queue systems commonly use three second‑level classifications to organize messages:
Tag – a sub‑topic that allows different business purposes to share the same Topic while being distinguished by a tag value.
Consumer Group – each group receives a full copy of a topic; consumers inside the same group compete for partitions, and the group maintains its own consumption offset.
Offset – the index of the next message to consume in a queue, analogous to an array index. Offsets are stored per group per queue.
RocketMQ
Architecture
RocketMQ separates production, storage, and consumption stages. A NameServer cluster provides horizontal scaling, while brokers handle message persistence using NIO, PageCache, sequential I/O, and zero‑copy techniques borrowed from Kafka. A single broker can sustain tens of thousands of TPS; a clustered deployment can reach trillion‑level throughput.
Key Features
Tag – enables flexible server‑side filtering and keeps client code clean.
Consumer Group – each group maintains an independent offset; messages are not removed after consumption, allowing multiple groups to read the same data.
Offset – stored per group per queue and incremented after each successful consumption.
Transactional Messaging
RocketMQ implements half‑transactions that follow a defined lifecycle:
Producer sends a message to the RocketMQ broker.
The broker persists the message and returns an ACK, marking the message as a “pending delivery” (half‑transaction).
Producer executes local transaction logic.
Producer commits or rolls back the transaction; the broker either makes the message deliverable (commit) or discards it (rollback).
If the broker does not receive a final status (e.g., network loss), it periodically checks the producer for the transaction result.
Producer re‑examines the local transaction outcome and resubmits the final status.
New Developments (RocketMQ 5.0)
RocketMQ 5.0 introduces a unified CommitLog with multiple indexes (time, queue, transaction, KV, batch, logical queue) and supports heterogeneous protocols such as RabbitMQ, Kafka, MQTT, and edge‑computing workloads. Leader election and master‑slave switching are handled by the DLedger Raft implementation.
Kafka
Architecture
Kafka is a distributed system composed of broker nodes, a Zookeeper‑based controller, and client libraries for many languages. Zookeeper stores metadata, performs controller election, and coordinates consumer groups. The controller broker manages cluster membership, topic/partition metadata, and ISR (in‑sync replica) updates.
Storage Model
Each partition is a log file. Kafka maintains auxiliary files for efficient access:
.log – stores raw message data.
.index – maps a message’s position within the log.
.snapshot – records the latest consumer offset.
.timeindex – indexes messages by timestamp (available since 0.10.0) to enable fast time‑based lookups.
Consumer Rebalance
Rebalancing is triggered by events such as new consumers joining, consumer crashes, explicit leave‑group requests, controller changes, topic/partition count changes, or cluster scaling. To reduce Zookeeper load, Kafka introduces a GroupCoordinator (Coordinator) mechanism that centralizes rebalance coordination.
Pulsar
Architecture
Pulsar clusters consist of one or more broker clusters, a BookKeeper storage cluster, and a ZooKeeper ensemble for coordination. Brokers perform load‑balancing, forward messages to BookKeeper for durable storage, and are stateless with respect to data.
Storage and Fault Tolerance
Multiple ledgers can be created per topic; each ledger is an append‑only structure written by a single writer and replicated across several Bookies.
Ledger writers are single‑process, eliminating write conflicts and enabling high throughput.
If a Bookie fails, other replicas continue serving reads; a background thread restores missing replicas.
Brokers are stateless, so adding or removing brokers does not affect durability.
Comparison
Benchmark reports indicate that Pulsar’s throughput can be roughly twice that of Kafka, with lower latency and superior I/O isolation. RocketMQ 5.0 aims to unify features of RabbitMQ, Kafka, and MQTT, positioning itself as a cloud‑native solution.
Tencent Cloud Middleware
Official account of Tencent Cloud Middleware. Focuses on microservices, messaging middleware and other cloud‑native technology trends, publishing product updates, case studies, and technical insights. Regularly hosts tech salons to share effective solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
