Backend Development 17 min read

Message Queues Unveiled: From Decoupling to Platformization and Core Architectures

This article traces the two‑decade evolution of message queues—from early decoupling solutions like ActiveMQ, through high‑throughput designs such as Kafka, to modern platformized systems like RocketMQ and Pulsar—while explaining fundamental concepts, partitioning, and storage architectures that underpin today’s distributed messaging platforms.

Sanyou's Java Diary
Sanyou's Java Diary
Sanyou's Java Diary
Message Queues Unveiled: From Decoupling to Platformization and Core Architectures

Message Queue Development Timeline

The article, authored by a member of Tencent TDMQ's founding team, reviews the history of message queues from 2003 to the present, highlighting three major stages and the problems each era aimed to solve.

1.1 First Stage: Decoupling

From 2003 to 2010, early queues such as ActiveMQ and RabbitMQ focused on breaking tight coupling between services and enabling asynchronous operations.

1.2 Second Stage: Throughput and Consistency

During the big‑data boom (2010‑2012), the need for higher throughput and stronger consistency led to the creation of Kafka, which excelled in log collection and data pipelines. Later, Alibaba’s e‑commerce demands prompted the development of RocketMQ, which inherited many Kafka design ideas while addressing its limitations, such as reliance on Zookeeper.

1.3 Third Stage: Platformization

Since 2012, cloud computing, Kubernetes, and containerization have driven the platformization of messaging services. Pulsar emerged to meet these new requirements, offering a layered and segmented architecture.

Common Architecture and Basic Concepts

2.1 Topics, Producers, Consumers

Using a cafeteria analogy, a topic represents a food category, a producer joins a queue to place an order, and a consumer retrieves the dish. This illustrates the core trio of concepts in any message‑queue system.

2.2 Partitions

Partitions enable horizontal scaling. When a cafeteria expands, multiple service windows (partitions) handle the same dish type, improving write throughput. In Kafka, partitions are the key to its high‑throughput capability.

Analysis of Mainstream Message Queue Storage

3.1 Kafka

Kafka’s architecture has no fixed master‑slave nodes; the master‑slave relationship exists per partition. Messages are appended sequentially to log files, allowing fast sequential writes and efficient use of the page cache. Consumers read offsets from the log, ensuring ordered consumption.

Kafka architecture diagram
Kafka architecture diagram

Kafka stores messages in partitions, each mapped to a continuous physical space on disk. Sequential writes dramatically improve performance compared to random writes.

3.2 RocketMQ

RocketMQ replaces Zookeeper with a lightweight namesrv service for metadata management and adopts a multi‑master, multi‑slave node model. Its storage consists of three files:

CommitLog : a sequential log where all messages are written; each file defaults to 1 GB.

ConsumeQueue : an index file per topic that stores message offsets, allowing fast in‑memory lookup.

IndexFile : a hash‑based index enabling key‑or‑time‑range queries.

RocketMQ architecture diagram
RocketMQ architecture diagram

3.3 Pulsar

Pulsar introduces a layered architecture that separates the stateless broker layer from the storage layer (BookKeeper). It also replaces coarse partitions with fine‑grained segments, providing higher availability and flexible scaling.

Pulsar layered and segmented architecture
Pulsar layered and segmented architecture

The broker cluster is stateless; all data resides in BookKeeper, while metadata is stored in Zookeeper. This design simplifies containerization, scaling, and disaster recovery.

Summary

Message‑queue technology has continuously evolved to address coupling, throughput, and platformization challenges. Each design—Kafka’s sequential log, RocketMQ’s commit‑log plus index files, and Pulsar’s layered, segmented storage—offers distinct trade‑offs. Selecting the right system depends on specific workload requirements and operational constraints.

Distributed SystemsKafkaMessage QueuerocketmqPulsarstorage architecture
Sanyou's Java Diary
Written by

Sanyou's Java Diary

Passionate about technology, though not great at solving problems; eager to share, never tire of learning!

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.