An Overview of Apache RocketMQ: Origin, Concept Model, Storage, Deployment, and Best Practices
This article introduces Apache RocketMQ by covering its origin, core concepts such as topics, producers and consumers, storage architecture with CommitLog and ConsumeQueue, deployment components like brokers and name servers, and practical best‑practice guidance for handling duplicates, ordering, and message replay.
Guest introduction: Liu Zhendong, Alibaba middleware technology expert, 2016 Middleware Performance Challenge runner‑up, with extensive experience in distributed system design and optimization, currently leading exploration and innovation for Apache RocketMQ.
The presentation covers RocketMQ’s origin, concept model, storage model, deployment model, and a summary of best practices.
1. Origin of RocketMQ
Like many products, RocketMQ was created to solve a specific problem; its early prototype was a monolithic “big stone” containing all required functions. As the business grew and thousands of developers contributed, performance bottlenecks emerged, prompting a decomposition into a distributed architecture.
The distributed design brings decoupling, allowing asynchronous communication so that changes in the lower layers do not affect upper‑layer applications. It also provides peak‑shaving capabilities and a natural ordering mechanism that makes RocketMQ act as a queue engine, preventing “collision” when multiple applications issue requests simultaneously.
2. Concept Model
In RocketMQ, a Topic represents a logical address, a Producer sends messages, and a Consumer receives them. In production environments, topics are often partitioned, and a single producer may have many subscribers, while a single consumer group may contain multiple consumers, forming one‑to‑many and many‑to‑one relationships.
The extended model shows two producers, two distributed topics, each topic backed by two physical Message Queues, a broker device, and two consumers. Consumer groups that share the same group ID receive broadcast subscriptions, while different groups operate independently.
3. Storage Model
RocketMQ stores messages using a combination of CommitLog and ConsumeQueue . The CommitLog holds the full message body and metadata; each ConsumeQueue corresponds to a MessageQueue and stores only the offset, size, and tag hash of messages in the CommitLog. This separation allows recovery of messages even if a ConsumeQueue is lost, as long as the CommitLog remains intact.
4. Deployment Model
In a real deployment, a Broker is the data node that stores messages, while a Nameserver provides service discovery. A producer first queries the Nameserver for the routing information of a target Topic (which brokers host the topic and which queues exist), then sends the message to the appropriate broker. Consumers follow the same lookup process before pulling messages.
5. Best‑Practice Summary
The following practical guidelines were distilled from real‑world experience and are also used as interview questions for Alibaba middleware positions:
Q1: How to avoid duplicate messages in a distributed messaging system? The root cause is unreliable networks. Ensure idempotent business logic on the consumer side and assign a unique identifier to each message, recording successful processing in a deduplication log. If a message ID already exists in the log, skip processing.
Q2: How to maintain message order during scaling without stopping writes? 1) Scale exponentially while keeping the same key hash mapping to old or new queues; 2) Record the maximum offset of the old queue before scaling; 3) For each consumer group, finish consuming the old queue before reading from the new one (disable reads on the new queue until the old data is drained).
Q3: How to replay messages in a distributed messaging system? Adjust the consumer offset to an earlier position; the system will re‑deliver messages from that offset.
Source: Alibaba Middleware
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architects' Tech Alliance
Sharing project experiences, insights into cutting-edge architectures, focusing on cloud computing, microservices, big data, hyper-convergence, storage, data protection, artificial intelligence, industry practices and solutions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
