Backend Development 12 min read

Design and Lessons Learned from Meizu's Real-Time Message Push System

The article details Meizu's large‑scale real‑time push architecture, covering system scale, four‑layer design, power‑consumption optimizations, network reliability challenges, massive connection handling, load‑balancing strategies, strict monitoring, and gray‑release practices to ensure high performance and stability.

Qunar Tech Salon

Mar 9, 2016

Design and Lessons Learned from Meizu's Real-Time Message Push System

This article is based on a presentation by Meizu architect Yu Xiaobo at the Meizu Technology Open Day, where he shared the real‑time message push architecture, the pitfalls encountered, and key insights.

The system serves roughly 25 million online users, handles about 5 billion page views per day, and can push up to 600 k messages per minute, as illustrated by the accompanying trend chart.

The logical architecture is divided into four layers: the bottom layer provides access for Meizu phones; the second layer is the message distribution service that routes upstream messages and delivers downstream messages using a user routing table; the third layer manages subscription information; the fourth layer stores offline messages and subscription data. A diagram of the architecture is shown.

Power‑consumption issues : Two main factors are traffic and battery usage. Traditional protocols such as XMPP and SIP are heavy and consume a lot of bandwidth. Meizu therefore designed a lightweight IDG protocol that reduces encoding/decoding time by tenfold and cuts traffic by 50‑70%, which also lowers battery drain through intelligent heartbeat intervals.

Network latency and duplicate messages : Mobile networks are unstable, leading to high latency and duplicate messages when ACKs are lost. The solution uses sequence‑number based interaction, a notification‑first approach, and a fallback mechanism that switches between DNS‑resolved IPs and pre‑embedded IP lists.

Massive connection handling : To support up to 4 million long‑lived connections per server, Meizu implemented the service in C++ with multi‑process + epoll, a memory pool to avoid fragmentation, and used Google’s tcmalloc. Kernel tuning included binding NIC interrupts to less‑loaded CPUs and extending TCP RTO from 200 ms to about 3 seconds.

Load balancing : Instead of a single LVS, the client selects the least‑loaded server by probing multiple IPs and choosing the fastest response. When a server’s load exceeds a threshold, it deliberately adds a small delay to its response, allowing other servers to take over.

System monitoring and gray release : Strict monitoring metrics (error count, inbound/outbound queue depth, request rate, interface latency, service availability) are collected for each node. Gray release enables user‑transparent deployments: a new version is rolled out to a few nodes, observed, then gradually expanded to the whole cluster.

Source: TOP100SUMMIT public account (original article link provided).

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

monitoring Performance Optimization real-time messaging load balancing gray-release push architecture backend scalability

Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.