How Meizu Scales Real‑Time Push to 600 M Messages/min: Architecture, Pitfalls & Solutions
The article details Meizu’s real‑time push system that supports 25 million online users and 6 million messages per minute, describing its four‑layer architecture, power‑saving strategies, network‑instability fixes, massive‑connection handling, monitoring practices, and gray‑release deployment techniques.
This article is based on a talk by Meizu architect Yu Xiaobo at the Meizu Tech Open Day, sharing the real‑time message push architecture, challenges, and lessons learned.
System Overview
The system serves about 25 million online users, handles roughly 50 billion page views per day, and can push up to 6 million messages per minute.
Architecture is divided into four layers: device access, message distribution (including upstream routing and downstream delivery with a user routing table), subscription information, and storage (offline and subscription messages).
Pitfalls & Insights
Mobile power consumption
Power consumption stems from traffic and battery usage. Traditional protocols XMPP and SIP are heavy and waste bandwidth, so Meizu created a lightweight IDG protocol that reduces traffic by 50‑70% and saves battery.
To further reduce battery drain, adaptive heartbeat intervals (3‑10 minutes) are used, and the IDG protocol together with smart heartbeats lowers both traffic and power usage.
For messages that are not latency‑critical (e.g., upgrades), delayed push is employed: the server only pushes when the device is awake, detected via heartbeat responses, thereby saving power.
Mobile network instability
Unstable mobile networks cause duplicate messages. The solution is to use sequence‑number based interaction: the server first notifies the client, the client requests the message with the latest sequence number, avoiding retransmission.
DNS issues are mitigated by embedding a list of IP addresses and falling back to direct IP connections when DNS fails.
Massive connections
Targeted 4 million concurrent long‑connections per machine using C++, multi‑process, epoll, memory pools, and TCMalloc. Kernel tuning (binding NIC interrupts to separate CPUs, increasing TCP RTO to ~3 s) improves performance.
Load balancing is achieved without a single LVS node; the client receives a sorted IP list, probes multiple servers, and selects the fastest response, also applying server‑side delayed responses based on load thresholds.
Monitoring & Gray‑Release
Monitoring
Each service node is monitored for error count, inbound/outbound queue depth, request rate, interface latency, and availability, with alerts to detect early signs of overload.
Gray release
Gray release enables user‑transparent deployments and smooth traffic migration; after a node passes a stability window, traffic is gradually expanded to more nodes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
