How Yanxuan’s Unified Message Center Scales with RocketMQ, Kafka, and K8s
This article details Yanxuan's evolution from a chaotic, multi‑queue setup to a unified, cloud‑native message center built on RocketMQ and Kafka, describing current services, scheduling mechanisms, publish‑subscribe implementation, and future plans for platformization and Kubernetes‑based resource management.
Background
Yanxuan's message center, a core foundational service, has continuously evolved. As user and business volume grew, services transitioned from monolithic Tomcat clusters to distributed architectures and now to a large cloud‑based plan built on the Qingzhou platform, with inter‑service communication increasingly relying on message queues.
Initially, no dedicated team maintained the message‑queue services, leading to chaotic usage: teams independently built and managed RabbitMQ, Kafka, or Redis list queues, handling production, consumption, retries, and monitoring themselves. This caused resource waste, management confusion, and maintenance difficulties. As traffic increased, RabbitMQ could no longer handle the load, prompting a gradual shift to RocketMQ, though the overall disorder persisted.
To address these issues, the team created a unified message center service. It wraps existing queues, satisfies diverse asynchronous messaging needs, and centralizes middleware management around RocketMQ and Kafka, offering unified monitoring and management. Business teams can either use the encapsulated message system or directly access the underlying middleware.
Current Status
The current service dependency diagram is shown below:
The message center now provides instant messaging, scheduled messaging, and publish‑subscribe capabilities. Businesses request access via the development workbench, but the management and control granularity remain coarse, prompting future consolidation into a unified access interface.
1. Instant Push Service (logistics‑uas)
logistics‑uas is the unified asynchronous request service, offering email, SMS, app push, and rate‑limiting for specific message gateways.
2. Heterogeneous Scheduling Service (yanxuan‑mhs)
yanxuan‑mhs uses MongoDB to temporarily store scheduled messages. When a business pushes a message with a scheduled time, it persists the message, registers a task with logistics‑dsc, and later logistics‑dsc triggers yanxuan‑mhs to retrieve the message and push it via logistics‑uas.
3. Scheduling Backend (logistics‑dsc)
logistics‑dsc relies on ZooKeeper and MySQL for storage; ZooKeeper also serves as a distributed coordinator and lock manager. The scheduling framework is Quartz.
System startup: each scheduler registers itself in ZooKeeper, sets listeners for node changes, and loads existing tasks into Quartz.
Task submission: incoming tasks are written to ZooKeeper and backed up in MySQL; all nodes listen for new tasks and load them into their local Quartz instances.
Task execution: when the scheduled time arrives, tasks are triggered. For single‑machine execution, nodes compete for a distributed lock so only one executes the callback, ensuring reliability.
Task allocation: tasks follow a primary‑backup model (one primary, two backups). Load is balanced by weight, and ZooKeeper re‑allocates roles when nodes become unavailable.
4. Publish‑Subscribe Service (yanxuan‑mps)
Businesses apply for a topic; after approval, the system creates records in the database and Kafka cluster. Publishers send messages via the mps API; the system wraps messages with metadata and writes them to Kafka. Consumers poll messages, and a push component forwards them to subscribed URLs. Failed pushes are retried via a dedicated retry queue.
Future Planning for a Unified Message System
The current system’s granularity is coarse; future work will aggregate services behind producer and consumer proxy layers, integrate with CMDB, and provide unified rate limiting, authentication, compression, encryption, orchestration, and routing.
Since the message center predates the CMDB, each sub‑service has its own authentication and owner management. The upcoming redesign will centralize these responsibilities.
Containerization and Platformization of Middleware
Currently, RocketMQ clusters run on VMs or physical machines, limiting scalability. The plan is to build a PAAS platform on Kubernetes, leveraging its dynamic scaling to manage resources and provide a RocketMQ service to the entire group.
1. Current Situation
RocketMQ is directly exposed to businesses but lacks platform‑level monitoring and scalability.
A joint project with Hangzhou Research and NetEase Music aims to deliver a K8s‑based RocketMQ PAAS in 2020.
2. Planned RocketMQ on K8s
2.1 Resource‑Level Management via K8s
Dynamic creation/deletion of NameServer clusters.
Dynamic creation/deletion of Broker clusters.
High‑availability maintenance for Broker clusters.
Configuration changes through controlled stop‑write‑modify‑restart cycles.
After the PAAS is built, topics become custom resources with lifecycle management (create, delete, modify).
3. Permission Isolation
Support admin and tenant roles; enforce resource‑level RBAC using Kubernetes RBAC for custom resources.
Admins see all instances; tenants see only their own. Admins can directly create, delete, modify resources; tenants submit requests for approval.
Conclusion
Yanxuan's message center has continuously supported business systems and is evolving toward a more platform‑oriented, cloud‑native architecture. Leveraging Kubernetes and PAAS concepts, the message ecosystem aims to become more systematic, scalable, and secure.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
