Design and Deployment of an Online Messaging System Using RocketMQ at Kuaishou

This article explains why Kuaishou needed a dedicated online messaging system, how RocketMQ was chosen and integrated, the deployment strategies, client SDK design, load‑balancing and disaster‑recovery mechanisms, diverse message features, distributed reconciliation monitoring, and performance‑tuning parameters.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Design and Deployment of an Online Messaging System Using RocketMQ at Kuaishou

Author Huang Li, with over 10 years of software development and architecture experience, currently leads the construction of an online messaging system at Kuaishou.

Why build an online messaging system – Although Kuaishou already uses Kafka extensively, certain scenarios such as selective retry without blocking other messages, delayed delivery, transactional consistency between database operations and message sending, and per‑message query for troubleshooting are not well supported by Kafka. To address these needs, a dedicated online‑oriented messaging system was required as a complement to Kafka.

After evaluating several middleware options, RocketMQ was selected because its feature set matched the business requirements closely and it offers a simple deployment model with wide adoption.

Deployment Mode and Implementation Strategy

There are two typical ways to adopt an open‑source component within an existing ecosystem:

Deeply modify the upstream code to add custom features, which makes future upgrades difficult.

Keep the upstream version unchanged (or with minimal incompatible changes) and wrap it externally to provide the required custom functionality.

Kuaishou chose the second approach. Initially using RocketMQ 4.5.2, the team later upgraded to 4.7 when the community reduced synchronous replication latency, benefiting from the new version without heavy customisation.

When deploying the cluster, several decisions were made:

Small cluster (better isolation, no cross‑IDC deployment) rather than a large cluster.

SSD storage and synchronous replication with asynchronous flushing.

Client SDK Encapsulation Strategy

Because the upstream RocketMQ code is not heavily modified, a thin SDK was built to expose only the essential API to downstream services. The SDK requires only a globally unique Topic and a Group; environment details such as NameServer addresses are resolved internally based on the topic.

The SDK architecture consists of three layers: a generic upper layer, a middle layer shared across implementations, and a lower layer that interacts with the specific MQ (RocketMQ in this case). This design allows swapping the underlying middleware without changing client code.

The SDK also supports hot‑configuration changes (e.g., routing policy, thread count, timeout) without restarting the client, and Maven‑based forced updates ensure that services always use the latest version.

Cluster Load Balancing & Disaster Recovery

Each Topic is replicated across two availability zones, and producers/consumers connect to at least two independent clusters. If one zone fails, traffic automatically switches to the other cluster.

A custom failover component was developed to provide:

Millions of OPS

Flexible weight adjustment

Health checks and event notifications

Concurrency control (throttling slow servers)

Resource prioritisation (e.g., local‑datacenter preference)

Automatic priority management

Incremental hot‑updates

The component is open‑source at https://github.com/PhantomThief/simple-failover-java and has no third‑party dependencies.

Diverse Message Features

Delayed Messages

RocketMQ’s built‑in delayed messages support only a few fixed levels, so a separate Delay Server was built to schedule arbitrary delays. The delay logic is implemented by switching topics and storing the delayed payload back into RocketMQ, allowing existing send APIs to remain unchanged.

Transactional Messages

Since RocketMQ 4.3, transactional messages are supported, ensuring that local DB transactions and message sends succeed or fail together. Recommendations include using version 4.6.1 or later, allocating sufficient thread pools (transaction thread pool at least four times the send thread pool), enabling useReentrantLockWhenPutMessage, and extending the default transaction timeout from 6 seconds to about 1 minute.

Producers should also enable retryAnotherBrokerWhenNotStoreOK when multiple brokers with a slave replica are used.

Distributed Reconciliation Monitoring

A monitoring program creates a dedicated topic on each broker and periodically sends a small number of messages to verify successful delivery. The program records outcomes such as success, flush timeout, slave timeout, slave unavailable, and error codes, without affecting business logic. Consumers ACK messages via a TCP side‑channel, and the producer aggregates ACK statistics for distributed reconciliation.

The monitoring data can also be used for performance testing and fault‑injection drills, with optional sampling to limit memory pressure.

Performance Optimisation

Default broker parameters are not optimal for Kuaishou’s SSD, synchronous replication, and asynchronous flushing setup. The following key parameters are recommended:

Parameter

Default

Description

flushCommitLogTimed

False

Should be true for asynchronous flushing to avoid excessive disk writes.

deleteWhen

04

Time of day to delete expired files; keep low‑traffic periods for deletion.

sendMessageThreadPoolNums

1

Number of threads handling message production; 2‑4 is usually optimal.

useReentrantLockWhenPutMessage

False

Set to true under high load to avoid CPU waste from spin‑locks.

sendThreadPoolQueueCapacity

10000

Queue size for pending send tasks; increase for high TPS workloads.

brokerFastFailureEnable

True

Enables fast‑failure (rate‑limiting) on the broker side.

waitTimeMillsInSendQueue

200

Timeout for waiting in the send queue; increase to ~1000 ms for stability.

osPageCacheBusyTimeOutMills

1000

Page‑cache timeout; raise on machines with large memory.

Conclusion

By adopting a simple, near‑zero‑dependency deployment model, Kuaishou achieved low‑cost small‑cluster deployments, easy upgrades, unified SDK entry points, multi‑datacenter active‑active availability, and leveraged RocketMQ features such as transactional and delayed messages to meet diverse business needs. Automatic distributed reconciliation ensures correctness of each broker and SDK, while the shared performance‑tuning guidelines provide a solid baseline for future scaling.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Backend Architectureperformance tuningRocketMQMessaging Systemtransactional messages
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.