Inside Meizu’s Real‑Time Push System: Architecture, Challenges & Solutions

This article presents a detailed walkthrough of Meizu’s real‑time push platform, covering its four‑layer architecture, high‑concurrency design, micro‑service RPC framework, power‑saving strategies, duplicate‑message handling, DNS reliability, load‑balancing tactics, monitoring setup, and gray‑release deployment.

ITPUB
ITPUB
ITPUB
Inside Meizu’s Real‑Time Push System: Architecture, Challenges & Solutions

System Overview

The Meizu push platform provides services such as system and app upgrades, device locating, contact sync, app store, online music, reading, and game center for Meizu users. It handles roughly 25 million concurrent online users, 5 billion daily page views, and can push up to 6 million messages per minute under current resources.

Four‑Layer Architecture

The system is divided into four logical layers:

Access Layer : Provides TCP long‑connection and HTTP services for clients.

Message Distribution Layer : Distributes upstream business messages to services and routes downstream push messages to the appropriate access servers using a routing table that stores connection information.

Business Logic Layer : Handles various business logic processing.

Storage Layer : Stores offline and subscription messages.

Two independent platforms handle monitoring and service management. Each service is a small, stateless, high‑concurrency component that can be deployed independently, with latency requirements under 1 ms.

Key Technical Challenges and Solutions

1. Micro‑service RPC Framework (kiev)

Initially, the in‑house RPC framework kiev used synchronous calls, which were simple but could not meet performance demands as user volume grew, leading to multithreading issues. The framework was refactored to asynchronous calls, and later a coroutine‑based version was created that hooks system I/O (e.g., send, recv) to achieve asynchronous performance while preserving a synchronous programming model.

2. Mobile Power Consumption

a) Traffic Savings : Instead of text‑heavy XMPP or SIP, a custom binary protocol was adopted, offering more than ten‑fold faster encoding/decoding and reducing network traffic by 50‑70 %.

b) Battery Savings : Because TCP long connections require periodic heartbeats, Meizu implemented an adaptive heartbeat strategy that adjusts the interval based on current network conditions, avoiding unnecessary wake‑ups.

3. Duplicate Message Problem

Unstable mobile networks cause acknowledgments to be lost, leading to retransmissions and duplicate deliveries. The solution replaces server‑side stateful tracking with a client‑driven pull model: the server notifies the client of new messages, the client pulls them using the last received sequence number, eliminating duplicates and keeping services stateless.

4. DNS Reliability

Carrier DNS services are often unreliable or hijacked. Meizu uses an all‑IP access method where the client fetches a pre‑sorted list of IPs via HTTP, selects the best one based on latency, and falls back to embedded IPs if DNS fails.

5. Massive Connection Load Balancing

Each access server can handle about 4 million long connections. Traditional LVS load balancers are unsuitable due to single‑point‑of‑failure concerns. Meizu’s approach sorts IPs by current load, and the client performs a “horse‑race” probe to multiple IPs, selecting the fastest responsive server, which also mitigates cross‑carrier latency issues.

Monitoring and Gray Release

The platform consists of many independent services; a failure in one does not affect the whole system, but accumulated issues can cause outages. Therefore, a strict monitoring system with strong metrics for each service is deployed (see diagram). Gray release is used to reduce deployment risk: only a subset of users receives new code, and rollout proceeds gradually based on health checks, eliminating the need for overnight releases.

Q&A Highlights

Gray release is a smooth, incremental deployment between black‑box and full rollout.

Duplicate‑message handling cannot rely on client‑side acknowledgment failures because the client does not know if its ACK was lost.

Push latency is typically under 300 ms with an effective delivery rate close to 100 % thanks to offline storage and retry mechanisms.

Mobile push must minimize both bandwidth and battery consumption, unlike PC push.

Offline messages expire after about seven days.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

BackendmonitoringReal-TimeMicroservicesload balancinggray releasepush
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.