How Meizu Scaled Real‑Time Push to 600 K Messages/min: Architecture, Pitfalls & Solutions
This article details Meizu's real‑time push system handling 25 million online users and 50 billion daily PVs, describing its four‑layer backend architecture, challenges such as power consumption, mobile network instability, massive connections, and the monitoring and gray‑release strategies used to ensure reliability and performance.
System Introduction
The system serves roughly 25 million online users, processes about 50 billion page views per day, and can push up to 600 000 messages per minute.
System Architecture Design
The architecture is logically divided into four layers: the bottom layer provides access for Meizu phones; the second layer is the message distribution service handling upstream routing and downstream delivery with a user routing table; the third layer manages subscription information; the fourth layer stores offline and subscription messages.
Independent service clusters and demo environments allow separate deployment and scaling, and a push platform offers business‑facing APIs.
Pitfalls & Insights
Mobile power consumption : Two main issues are data traffic and battery drain. Traditional protocols like XMPP and SIP are heavy and consume excessive bandwidth. Meizu created a lightweight IDG protocol, achieving roughly ten times faster encoding/decoding and reducing traffic by 50‑70%.
To lower battery usage, adaptive heartbeat intervals (3‑10 minutes) are used, and the IDG protocol’s smart heartbeat further reduces power draw.
Mobile network instability : Unstable connections cause duplicate messages. Meizu introduced sequence‑based messaging and a DNS‑free IP list to select optimal servers, falling back to pre‑embedded IPs when DNS fails.
Massive connections : The goal of 4 million concurrent long‑connections per machine is achieved with C++ implementation, multi‑process + epoll, memory pools, and TCMalloc. Kernel tuning (CPU affinity for NIC interrupts, TCP RTO increased to ~3 seconds) improves performance.
Load balancing : Instead of a single LVS, Meizu uses client‑side IP selection based on latency probes, distributing load across servers and mitigating cross‑carrier latency.
System Monitoring and Gray Release
Strict monitoring covers error counts, inbound/outbound queue sizes, request rates, interface latency, and service availability, with alerts for abnormal metrics.
Gray release enables seamless, user‑transparent deployments: a node is released, observed, then gradually rolled out to more nodes after validation, reducing night‑time releases and improving stability.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
