How Meizu Scaled Real‑Time Push to 600 K Messages/min: Architecture, Pitfalls & Solutions

This article details Meizu's real‑time push system handling 25 million online users and 50 billion daily PVs, describing its four‑layer backend architecture, challenges such as power consumption, mobile network instability, massive connections, and the monitoring and gray‑release strategies used to ensure reliability and performance.

21CTO
21CTO
21CTO
How Meizu Scaled Real‑Time Push to 600 K Messages/min: Architecture, Pitfalls & Solutions

System Introduction

The system serves roughly 25 million online users, processes about 50 billion page views per day, and can push up to 600 000 messages per minute.

System Architecture Design

The architecture is logically divided into four layers: the bottom layer provides access for Meizu phones; the second layer is the message distribution service handling upstream routing and downstream delivery with a user routing table; the third layer manages subscription information; the fourth layer stores offline and subscription messages.

Independent service clusters and demo environments allow separate deployment and scaling, and a push platform offers business‑facing APIs.

Pitfalls & Insights

Mobile power consumption : Two main issues are data traffic and battery drain. Traditional protocols like XMPP and SIP are heavy and consume excessive bandwidth. Meizu created a lightweight IDG protocol, achieving roughly ten times faster encoding/decoding and reducing traffic by 50‑70%.

To lower battery usage, adaptive heartbeat intervals (3‑10 minutes) are used, and the IDG protocol’s smart heartbeat further reduces power draw.

Mobile network instability : Unstable connections cause duplicate messages. Meizu introduced sequence‑based messaging and a DNS‑free IP list to select optimal servers, falling back to pre‑embedded IPs when DNS fails.

Massive connections : The goal of 4 million concurrent long‑connections per machine is achieved with C++ implementation, multi‑process + epoll, memory pools, and TCMalloc. Kernel tuning (CPU affinity for NIC interrupts, TCP RTO increased to ~3 seconds) improves performance.

Load balancing : Instead of a single LVS, Meizu uses client‑side IP selection based on latency probes, distributing load across servers and mitigating cross‑carrier latency.

System Monitoring and Gray Release

Strict monitoring covers error counts, inbound/outbound queue sizes, request rates, interface latency, and service availability, with alerts for abnormal metrics.

Gray release enables seamless, user‑transparent deployments: a node is released, observed, then gradually rolled out to more nodes after validation, reducing night‑time releases and improving stability.

Monitoringbackend architecturegray-releasereal-time pushLarge Scale Messaging
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.