Kafka and RocketMQ Architecture: Availability, Reliability, and Design Insights

This article examines the architectures of Kafka and RocketMQ, analyzes their availability and reliability mechanisms, compares their strengths and weaknesses, and proposes hybrid designs and simplified MQ solutions for building robust message‑queue systems.

Architecture Digest
Architecture Digest
Architecture Digest
Kafka and RocketMQ Architecture: Availability, Reliability, and Design Insights

Continuing from the previous discussion on business requirements for message middleware, this article explores the pros and cons of various architectures, focusing on availability and reliability, and presents the author's thoughts on MQ architecture.

Kafka

Kafka's system architecture includes Producer, Consumer, Kafka cluster, and ZooKeeper. ZooKeeper acts as a NameServer, storing metadata and handling leader election and coordination. Brokers store the messages.

Availability

Kafka depends only on ZooKeeper, which is highly available (a 2N+1 node cluster tolerates N failures), so external dependencies do not affect the cluster's availability. Kafka's own availability relies on its replication strategy: each partition is replicated across multiple brokers, with one broker acting as the leader. If a broker fails, a new leader is elected, keeping the system operational. The replication factor is configurable, determining the level of fault tolerance.

Reliability

Reliability means that written messages are eventually consumed and not lost. Kafka ensures this by persisting messages to disk (synchronously or asynchronously) and replicating them to other brokers. As long as at least one replica remains, data loss is prevented except in total data‑center failures.

Evaluation

Advantages: external coordination is offloaded to ZooKeeper, simplifying broker logic; high machine utilization due to mutual backup among brokers.

Disadvantages: introduces an external dependency (ZooKeeper) and adds operational complexity; implementing mutual backup is more complex than a simple master‑slave model.

RocketMQ

RocketMQ consists of Producer, Consumer, NameServer, and Broker. Unlike Kafka, RocketMQ implements its own cluster‑mode NameServer, which is essentially stateless and can be deployed in multiple instances without affecting availability.

Availability

RocketMQ's NameServer is highly available by design. Brokers can be deployed as multiple masters; if a master fails, other masters continue serving writes, though some data on the failed broker may become unavailable. RocketMQ uses a master‑slave model to mitigate this, allowing read requests to be redirected to the slave after a master failure.

Reliability

RocketMQ persists messages using synchronous disk flush, ensuring that a message is safely stored before acknowledging the producer. The write flow is: write to page cache → trigger flush thread → flush thread writes to disk → wake up front‑end thread to return success. Synchronous flush provides stronger durability than asynchronous flush, though it incurs higher latency.

RocketMQ also offers a synchronous double‑write mode to avoid data loss in master‑slave replication delays.

Evaluation

Advantages: no external dependencies like ZooKeeper, simplifying operations and improving availability.

Disadvantages: master‑slave architecture leads to lower machine utilization; many slaves remain idle, and most deployments use a single slave, limiting reliability for high‑throughput scenarios.

Other MQ Architectures

The author explores hybrid designs that combine Kafka's mutual backup with RocketMQ's lightweight NameServer, as well as architectures that remove the NameServer entirely, using Gossip protocols for metadata replication and consensus algorithms (Raft, Paxos) for leader election.

Proposed simplified MQ design includes a single‑node NameServer (discoverable via DNS), master‑slave broker clusters, and metadata stored on brokers and aggregated by the NameServer. This aims for high availability and reliability while keeping implementation straightforward.

Conclusion

The article introduced Kafka and RocketMQ architectures, discussed their availability and reliability, compared their strengths and weaknesses, and offered the author's own MQ design considerations for future articles.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

KafkaReliabilityRocketMQ
Architecture Digest
Written by

Architecture Digest

Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.