Evolution and High‑Availability Construction of the Haodafu Offline Message Push System
This article describes how the Haodafu offline push service grew from a simple PHP notification tool into a robust, highly‑available micro‑service platform by redesigning architecture, adopting vendor push channels, adding message‑queue reliability, implementing comprehensive monitoring, observability, and a fault‑diagnosis platform to ensure delivery rates and operational stability.
Background
With the rapid development of mobile Internet, most apps provide push notifications to actively deliver personalized information to users. Haodafu’s push service, used for doctor‑patient communication and subscription notifications, must guarantee high delivery rates and timeliness, prompting a complete overhaul of its offline message push system.
System High‑Availability Construction
1. Service Prototype – Push Tool
Initially the push function was a simple notification tool built with PHP, designed only to remind users of new messages. Over time, increasing user expectations and complaints revealed many shortcomings such as lack of retry, certificate management issues, poor Android channel support, and missing monitoring.
2. Service Evolution – Push System
2.1 Requirement Analysis
The redesign aimed to treat push as a full‑featured system rather than a mere tool, separating core functions, user‑experience features, and operational services.
2.2 Technical Selection
After evaluating third‑party SDKs (Jiguang, Getui, Umeng, Baidu) and vendor‑native channels (Mi Push, Huawei Push, FlyMe Push), the team chose to directly integrate vendor services for stability and data security, using third‑party services only as a fallback.
2.3 High‑Availability Channel Optimization
To avoid single‑point failures, link backup strategies were introduced: multiple APNS exits for iOS, and a fallback Umeng channel for Android when vendor APIs fail.
2.4 Guaranteeing No Message Loss
A message‑queue layer (with retry and compensation mechanisms) was added to ensure that transient network or machine failures do not cause message loss.
2.5 Other Optimizations
Switched APNS authentication to p8 token for automatic renewal.
Adopted open‑source Pushy for iOS stability.
Customized high‑priority channels for key business pushes.
Upgraded SDKs to increase payload limits.
Supported single‑ and batch‑push modes.
Unified push implementations across doctor and patient apps.
Implemented end‑to‑end message lifecycle tracking and click analytics.
System Stability Operations
3.1 Monitoring & Alerting
Built a monitoring system based on Google SRE principles, focusing on message failure rate and delivery latency, with alert rules and on‑call escalation.
3.2 Observability
Collected Metrics, Tracing, and Logging using Prometheus, Grafana, and ClickHouse, providing dashboards for overall health, device‑type analysis, long‑term trends, risk assessment, and anomaly detection.
3.3 Fault Diagnosis Platform
Developed a one‑click diagnosis portal for operations staff to view recent push stats, device status, notification‑switch state, send test messages, and trace message paths, dramatically reducing troubleshooting time.
Summary
The push system has evolved from a basic PHP tool to a mature micro‑service architecture that satisfies both technical and operational requirements, delivering reliable notifications while minimizing user disturbance.
Future Plans
Future work will focus on simplifying push strategies, enhancing interactive UI, and adding conversion‑rate analytics to turn notifications into measurable business value.
HaoDF Tech Team
HaoDF Online tech practice and sharing—join us to discuss and help create quality healthcare through technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.