Kafka and RocketMQ Architecture: Availability, Reliability, and Design Insights
This article examines the architectures of Kafka and RocketMQ, analyzes their availability and reliability mechanisms, compares their strengths and weaknesses, and proposes hybrid designs and simplified MQ solutions for building robust message‑queue systems.
Continuing from the previous discussion on business requirements for message middleware, this article explores the pros and cons of various architectures, focusing on availability and reliability, and presents the author's thoughts on MQ architecture.
Kafka
Kafka's system architecture includes Producer, Consumer, Kafka cluster, and ZooKeeper. ZooKeeper acts as a NameServer, storing metadata and handling leader election and coordination. Brokers store the messages.
Availability
Kafka depends only on ZooKeeper, which is highly available (a 2N+1 node cluster tolerates N failures), so external dependencies do not affect the cluster's availability. Kafka's own availability relies on its replication strategy: each partition is replicated across multiple brokers, with one broker acting as the leader. If a broker fails, a new leader is elected, keeping the system operational. The replication factor is configurable, determining the level of fault tolerance.
Reliability
Reliability means that written messages are eventually consumed and not lost. Kafka ensures this by persisting messages to disk (synchronously or asynchronously) and replicating them to other brokers. As long as at least one replica remains, data loss is prevented except in total data‑center failures.
Evaluation
Advantages: external coordination is offloaded to ZooKeeper, simplifying broker logic; high machine utilization due to mutual backup among brokers.
Disadvantages: introduces an external dependency (ZooKeeper) and adds operational complexity; implementing mutual backup is more complex than a simple master‑slave model.
RocketMQ
RocketMQ consists of Producer, Consumer, NameServer, and Broker. Unlike Kafka, RocketMQ implements its own cluster‑mode NameServer, which is essentially stateless and can be deployed in multiple instances without affecting availability.
Availability
RocketMQ's NameServer is highly available by design. Brokers can be deployed as multiple masters; if a master fails, other masters continue serving writes, though some data on the failed broker may become unavailable. RocketMQ uses a master‑slave model to mitigate this, allowing read requests to be redirected to the slave after a master failure.
Reliability
RocketMQ persists messages using synchronous disk flush, ensuring that a message is safely stored before acknowledging the producer. The write flow is: write to page cache → trigger flush thread → flush thread writes to disk → wake up front‑end thread to return success. Synchronous flush provides stronger durability than asynchronous flush, though it incurs higher latency.
RocketMQ also offers a synchronous double‑write mode to avoid data loss in master‑slave replication delays.
Evaluation
Advantages: no external dependencies like ZooKeeper, simplifying operations and improving availability.
Disadvantages: master‑slave architecture leads to lower machine utilization; many slaves remain idle, and most deployments use a single slave, limiting reliability for high‑throughput scenarios.
Other MQ Architectures
The author explores hybrid designs that combine Kafka's mutual backup with RocketMQ's lightweight NameServer, as well as architectures that remove the NameServer entirely, using Gossip protocols for metadata replication and consensus algorithms (Raft, Paxos) for leader election.
Proposed simplified MQ design includes a single‑node NameServer (discoverable via DNS), master‑slave broker clusters, and metadata stored on brokers and aggregated by the NameServer. This aims for high availability and reliability while keeping implementation straightforward.
Conclusion
The article introduced Kafka and RocketMQ architectures, discussed their availability and reliability, compared their strengths and weaknesses, and offered the author's own MQ design considerations for future articles.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architecture Digest
Focusing on Java backend development, covering application architecture from top-tier internet companies (high availability, high performance, high stability), big data, machine learning, Java architecture, and other popular fields.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
