Backend Development 13 min read

Building a Scalable Push Notification System for 50 B Daily PVs

This article details the design, performance challenges, and optimization techniques of a high‑throughput push notification platform that serves 25 million concurrent users, processes 5 billion daily page views, and delivers up to 6 million messages per minute.

Big Data and Microservices

May 10, 2016

Building a Scalable Push Notification System for 50 B Daily PVs

System Overview

The platform supports roughly 25 million concurrent online users and handles about 5 billion page views per day, with a push throughput of up to 6 million messages per minute.

Architecture Design

The system is logically divided into four layers:

Device Access Layer : Provides connectivity for Meizu phones.

Message Distribution Service : Routes upstream messages and delivers downstream messages using a user routing table.

Subscription Information Layer : Manages subscription data.

Storage Layer : Stores offline messages and subscription messages.

Service management and business monitoring are separate components. A dedicated push platform offers APIs for business use, and each service runs in an independent demo cluster that can be deployed and scaled independently.

Mobile Power‑Consumption Issues

Two main factors affect phone power consumption: traffic and battery usage. Traditional protocols such as XMPP and SIP are feature‑rich but heavy, with extensive specifications and many unnecessary tags, leading to high bandwidth consumption.

To address this, the team created a lightweight custom protocol (IDG) that is about ten times faster in encoding/decoding and reduces traffic by 50‑70 %.

Battery usage is mitigated by sending periodic heartbeat packets (e.g., every 3, 5, or 10 minutes) to keep long‑connections alive only when necessary. The IDG protocol also supports intelligent heartbeats to further lower power draw.

For messages that do not require strict real‑time delivery (e.g., upgrade notifications), the system employs delayed push: the server waits until the client is awake (detected via heartbeat) before sending the message, reducing unnecessary wake‑ups.

Mobile Network Challenges

Unstable and high‑latency mobile networks cause duplicate message delivery. The system solves this by using sequence‑number‑based interactions: the server first sends a notification, the client fetches the message with the latest sequence number, and duplicate deliveries are avoided.

DNS failures are mitigated by maintaining a pre‑embedded list of IP addresses. Clients first try HTTP to obtain the IP list; if DNS resolution fails, they fall back to the embedded IPs.

Handling Massive Connections

The goal is 4 million concurrent long‑connections per machine. The implementation uses C++ for performance, multi‑process architecture with epoll, a memory pool to avoid fragmentation, and Google’s tcmalloc for efficient allocation.

Kernel tuning addresses CPU load imbalance caused by NIC interrupts: interrupts are bound to less‑loaded CPUs to achieve load balancing. TCP retransmission timeout (RTO) is increased from the default 200 ms to about 3 seconds to reduce unnecessary retransmissions on high‑latency links.

Load balancing is performed on the client side: after receiving a sorted IP list (ordered by current load), the client probes multiple IPs and selects the one with the fastest response. The server also applies a delayed‑response strategy when its load exceeds thresholds (e.g., adding 50 ms delay per 100 k extra connections) to prevent over‑loading.

System Monitoring

A comprehensive monitoring suite tracks critical metrics for each service node:

Error count – triggers alerts when a node generates many errors.

Send/receive queue lengths – indicate potential overload.

Request volume – helps forecast capacity needs.

Interface latency – normal calls should be <1 ms; higher values signal issues.

Service availability – immediate alerts on downtime.

Gray Release Strategy

Gray (canary) releases allow user‑transparent, gradual rollouts. The process:

Deploy to a single node and monitor for a few days.

If stable, expand to a few more nodes (partial traffic).

After confirming stability, roll out to all nodes.

This approach reduces the need for late‑night releases and improves developer experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Backend Monitoring push notifications Load Balancing gray-release scalable architecture

Written by

Big Data and Microservices

Focused on big data architecture, AI applications, and cloud‑native microservice practices, we dissect the business logic and implementation paths behind cutting‑edge technologies. No obscure theory—only battle‑tested methodologies: from data platform construction to AI engineering deployment, and from distributed system design to enterprise digital transformation.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.