Design and Deployment of an Online Messaging System Using RocketMQ at Kuaishou
This article explains why Kuaishou needed a dedicated online messaging system, how RocketMQ was chosen and integrated, the deployment strategies, client SDK design, load‑balancing and disaster‑recovery mechanisms, diverse message features, distributed reconciliation monitoring, and performance‑tuning parameters.
Author Huang Li, with over 10 years of software development and architecture experience, currently leads the construction of an online messaging system at Kuaishou.
Why build an online messaging system – Although Kuaishou already uses Kafka extensively, certain scenarios such as selective retry without blocking other messages, delayed delivery, transactional consistency between database operations and message sending, and per‑message query for troubleshooting are not well supported by Kafka. To address these needs, a dedicated online‑oriented messaging system was required as a complement to Kafka.
After evaluating several middleware options, RocketMQ was selected because its feature set matched the business requirements closely and it offers a simple deployment model with wide adoption.
Deployment Mode and Implementation Strategy
There are two typical ways to adopt an open‑source component within an existing ecosystem:
Deeply modify the upstream code to add custom features, which makes future upgrades difficult.
Keep the upstream version unchanged (or with minimal incompatible changes) and wrap it externally to provide the required custom functionality.
Kuaishou chose the second approach. Initially using RocketMQ 4.5.2, the team later upgraded to 4.7 when the community reduced synchronous replication latency, benefiting from the new version without heavy customisation.
When deploying the cluster, several decisions were made:
Small cluster (better isolation, no cross‑IDC deployment) rather than a large cluster.
SSD storage and synchronous replication with asynchronous flushing.
Client SDK Encapsulation Strategy
Because the upstream RocketMQ code is not heavily modified, a thin SDK was built to expose only the essential API to downstream services. The SDK requires only a globally unique Topic and a Group; environment details such as NameServer addresses are resolved internally based on the topic.
The SDK architecture consists of three layers: a generic upper layer, a middle layer shared across implementations, and a lower layer that interacts with the specific MQ (RocketMQ in this case). This design allows swapping the underlying middleware without changing client code.
The SDK also supports hot‑configuration changes (e.g., routing policy, thread count, timeout) without restarting the client, and Maven‑based forced updates ensure that services always use the latest version.
Cluster Load Balancing & Disaster Recovery
Each Topic is replicated across two availability zones, and producers/consumers connect to at least two independent clusters. If one zone fails, traffic automatically switches to the other cluster.
A custom failover component was developed to provide:
Millions of OPS
Flexible weight adjustment
Health checks and event notifications
Concurrency control (throttling slow servers)
Resource prioritisation (e.g., local‑datacenter preference)
Automatic priority management
Incremental hot‑updates
The component is open‑source at https://github.com/PhantomThief/simple-failover-java and has no third‑party dependencies.
Diverse Message Features
Delayed Messages
RocketMQ’s built‑in delayed messages support only a few fixed levels, so a separate Delay Server was built to schedule arbitrary delays. The delay logic is implemented by switching topics and storing the delayed payload back into RocketMQ, allowing existing send APIs to remain unchanged.
Transactional Messages
Since RocketMQ 4.3, transactional messages are supported, ensuring that local DB transactions and message sends succeed or fail together. Recommendations include using version 4.6.1 or later, allocating sufficient thread pools (transaction thread pool at least four times the send thread pool), enabling useReentrantLockWhenPutMessage, and extending the default transaction timeout from 6 seconds to about 1 minute.
Producers should also enable retryAnotherBrokerWhenNotStoreOK when multiple brokers with a slave replica are used.
Distributed Reconciliation Monitoring
A monitoring program creates a dedicated topic on each broker and periodically sends a small number of messages to verify successful delivery. The program records outcomes such as success, flush timeout, slave timeout, slave unavailable, and error codes, without affecting business logic. Consumers ACK messages via a TCP side‑channel, and the producer aggregates ACK statistics for distributed reconciliation.
The monitoring data can also be used for performance testing and fault‑injection drills, with optional sampling to limit memory pressure.
Performance Optimisation
Default broker parameters are not optimal for Kuaishou’s SSD, synchronous replication, and asynchronous flushing setup. The following key parameters are recommended:
Parameter
Default
Description
flushCommitLogTimed
False
Should be true for asynchronous flushing to avoid excessive disk writes.
deleteWhen
04
Time of day to delete expired files; keep low‑traffic periods for deletion.
sendMessageThreadPoolNums
1
Number of threads handling message production; 2‑4 is usually optimal.
useReentrantLockWhenPutMessage
False
Set to true under high load to avoid CPU waste from spin‑locks.
sendThreadPoolQueueCapacity
10000
Queue size for pending send tasks; increase for high TPS workloads.
brokerFastFailureEnable
True
Enables fast‑failure (rate‑limiting) on the broker side.
waitTimeMillsInSendQueue
200
Timeout for waiting in the send queue; increase to ~1000 ms for stability.
osPageCacheBusyTimeOutMills
1000
Page‑cache timeout; raise on machines with large memory.
Conclusion
By adopting a simple, near‑zero‑dependency deployment model, Kuaishou achieved low‑cost small‑cluster deployments, easy upgrades, unified SDK entry points, multi‑datacenter active‑active availability, and leveraged RocketMQ features such as transactional and delayed messages to meet diverse business needs. Automatic distributed reconciliation ensures correctness of each broker and SDK, while the shared performance‑tuning guidelines provide a solid baseline for future scaling.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
