How Meizu Scales Real‑Time Push to 600 M Messages/min: Architecture, Pitfalls & Solutions
The article details Meizu's massive real‑time push system handling 25 million online users and 600 million messages per minute, explains its four‑layer architecture, and shares how the team tackled phone power consumption, mobile network instability, massive connections, monitoring, and gray‑release deployment.
This article is based on a talk by Meizu architect Yu Xiaobo at the Meizu Technology Open Day, sharing the real‑time push architecture, pitfalls, and lessons.
System Overview
The system serves about 25 million online users, generates roughly 50 billion page views per day, and can push up to 6 million messages per minute.
Data structure
System architecture logic is divided into four layers: the bottom layer provides device access for Meizu phones; the second layer is the message distribution service handling upstream routing and user downstream routing tables; the third layer manages subscription information; the fourth layer stores offline and subscription messages.
Pitfalls & Insights
Phone power consumption
Power consumption stems from traffic and battery usage. Traditional XMPP and SIP protocols are heavy; Meizu created a lightweight IDG protocol that reduces traffic by 50‑70 % and speeds up encoding/decoding tenfold.
To lower battery usage, they send periodic heartbeat packets (3‑10 minutes) and use the IDG protocol with intelligent heartbeats.
They also implement delayed push for non‑time‑critical messages, sending them only when the phone is awake, which further saves power.
Mobile network issues
Unstable mobile networks cause duplicate messages. They solve this by using sequence‑number based interaction: the server first notifies the client, the client requests the message with its latest sequence number.
DNS problems are mitigated by embedding a list of IPs and falling back to direct IP connections when DNS fails.
Massive connections
Targeting 4 million long‑lived connections per machine, they use C++, multi‑process + epoll, memory pools (tcmalloc), kernel parameter tuning, and bind NIC interrupts to specific CPUs for load balancing. They also increase TCP RTO to about 3 seconds.
Load balancing is achieved without a single LVS node; clients receive a sorted IP list, probe multiple servers, and select the fastest response, with server‑side delayed responses based on current load.
Monitoring and Gray Release
System monitoring
Each micro‑service is monitored with metrics such as error count, inbound/outbound queue length, request rate, interface latency, and service availability, with alerts to detect potential failures early.
Gray release
Gray release enables seamless, user‑transparent deployments. After a node passes a short observation period, the release is gradually expanded to more nodes, reducing risk and avoiding night‑time deployments.
Author: Yu Xiaobo, Meizu backend engineer focusing on high concurrency and distributed solutions.
Source: http://blog.csdn.net/guolong1983811/article/details/50421901
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITFLY8 Architecture Home
ITFLY8 Architecture Home - focused on architecture knowledge sharing and exchange, covering project management and product design. Includes large-scale distributed website architecture (high performance, high availability, caching, message queues...), design patterns, architecture patterns, big data, project management (SCRUM, PMP, Prince2), product design, and more.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
