Message Queues Unveiled: From Decoupling to Platformization and Core Architectures
This article traces the two‑decade evolution of message queues—from early decoupling solutions like ActiveMQ, through high‑throughput designs such as Kafka, to modern platformized systems like RocketMQ and Pulsar—while explaining fundamental concepts, partitioning, and storage architectures that underpin today’s distributed messaging platforms.
Message Queue Development Timeline
The article, authored by a member of Tencent TDMQ's founding team, reviews the history of message queues from 2003 to the present, highlighting three major stages and the problems each era aimed to solve.
1.1 First Stage: Decoupling
From 2003 to 2010, early queues such as ActiveMQ and RabbitMQ focused on breaking tight coupling between services and enabling asynchronous operations.
1.2 Second Stage: Throughput and Consistency
During the big‑data boom (2010‑2012), the need for higher throughput and stronger consistency led to the creation of Kafka, which excelled in log collection and data pipelines. Later, Alibaba’s e‑commerce demands prompted the development of RocketMQ, which inherited many Kafka design ideas while addressing its limitations, such as reliance on Zookeeper.
1.3 Third Stage: Platformization
Since 2012, cloud computing, Kubernetes, and containerization have driven the platformization of messaging services. Pulsar emerged to meet these new requirements, offering a layered and segmented architecture.
Common Architecture and Basic Concepts
2.1 Topics, Producers, Consumers
Using a cafeteria analogy, a topic represents a food category, a producer joins a queue to place an order, and a consumer retrieves the dish. This illustrates the core trio of concepts in any message‑queue system.
2.2 Partitions
Partitions enable horizontal scaling. When a cafeteria expands, multiple service windows (partitions) handle the same dish type, improving write throughput. In Kafka, partitions are the key to its high‑throughput capability.
Analysis of Mainstream Message Queue Storage
3.1 Kafka
Kafka’s architecture has no fixed master‑slave nodes; the master‑slave relationship exists per partition. Messages are appended sequentially to log files, allowing fast sequential writes and efficient use of the page cache. Consumers read offsets from the log, ensuring ordered consumption.
Kafka stores messages in partitions, each mapped to a continuous physical space on disk. Sequential writes dramatically improve performance compared to random writes.
3.2 RocketMQ
RocketMQ replaces Zookeeper with a lightweight namesrv service for metadata management and adopts a multi‑master, multi‑slave node model. Its storage consists of three files:
CommitLog : a sequential log where all messages are written; each file defaults to 1 GB.
ConsumeQueue : an index file per topic that stores message offsets, allowing fast in‑memory lookup.
IndexFile : a hash‑based index enabling key‑or‑time‑range queries.
3.3 Pulsar
Pulsar introduces a layered architecture that separates the stateless broker layer from the storage layer (BookKeeper). It also replaces coarse partitions with fine‑grained segments, providing higher availability and flexible scaling.
The broker cluster is stateless; all data resides in BookKeeper, while metadata is stored in Zookeeper. This design simplifies containerization, scaling, and disaster recovery.
Summary
Message‑queue technology has continuously evolved to address coupling, throughput, and platformization challenges. Each design—Kafka’s sequential log, RocketMQ’s commit‑log plus index files, and Pulsar’s layered, segmented storage—offers distinct trade‑offs. Selecting the right system depends on specific workload requirements and operational constraints.
Sanyou's Java Diary
Passionate about technology, though not great at solving problems; eager to share, never tire of learning!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.