Backend Development 15 min read

Architecture and Key Technologies of a Scalable Message Push Platform

The document outlines the design, key components, data flow, and operational strategies of a large‑scale message push platform, detailing its architecture, request handling, long‑connection management, retry mechanisms, data statistics, monitoring, and future expansion plans.

JD Retail Technology
JD Retail Technology
JD Retail Technology
Architecture and Key Technologies of a Scalable Message Push Platform

Background

Every app or business needs to push information to user devices. As a middle‑platform service, the push platform must provide reliable, stable push capabilities for many internal applications simultaneously, which led to the creation of the message push platform.

Push Platform Architecture

Terminology

dt: deviceToken, the unique identifier of a device. appId: the application code applied for on the push platform. token: the secret key associated with each application. msgId: the unique identifier generated by the platform for each push request.

Platform‑side Services

send‑pservice: web portal for creating pushes, viewing data, etc. send‑api: external API service (JSF and HTTP) providing push, unbind, deregister functions. send‑worker: responsible for pulling scheduled pushes and sending them.

Channel‑side Services

habitat: device database operations (bind, unbind, report, deregister). mutate: receives push requests, handles pin‑dt mapping and channel adaptation. report: receives first‑login device reports and stores dt. channel: sends the final push request to each push channel. slark: receipt service that processes asynchronous feedback from vendor channels.

Other Components

Self‑built channel: the platform’s own push channel maintaining long‑lived connections. Vendor channel: external, system‑level push services provided by device manufacturers. DataAnalysis: statistical service that aggregates push flow data.

Business users can push either via the web portal or by calling the send‑api. Push types include broadcast and targeted user pushes, selectable during creation.

Key Data Preparation

At the application level, each app receives an appId and token for authentication. At the device level, the platform stores the dt‑pin‑appId mapping for message assembly and device management.

Key Technical Steps

1. Push Request Processing

Users can create pushes via the portal or invoke the send‑api (the latter is more common). The API supports HTTP and JSF, handling both real‑time and scheduled pushes. Immediate pushes are sent to the channel side instantly; scheduled pushes are stored in a Redis sorted set, and send‑worker pulls and dispatches them at the appropriate time.

To improve performance, after basic parameter validation the request handling is performed asynchronously using a thread pool, reducing response latency under high concurrency.

2. Vendor Channel Request

The channel side extracts dt values, combines them with the message payload, and forwards the request to the vendor channel. When a pin is provided, the system resolves it to the corresponding dt set. For large‑scale broadcasts, dt values are batched to keep request sizes manageable.

In high‑concurrency scenarios, vendor channels may rate‑limit requests. The platform implements a configurable retry strategy, allowing dynamic adjustment of retry count and intervals to balance success rate and resource consumption.

3. Long‑Connection Establishment and Maintenance

The self‑built channel provides a TCP+TLS long‑connection service for devices that cannot use vendor channels.

Establishment

When a client connects, the server returns heartbeat parameters. The client periodically sends heartbeats when idle, keeping the connection alive. Heartbeat intervals (idle and heartbeat) are configurable.

Maintenance

The server tracks client contexts; if no data or heartbeat is received within the idle period, the connection is considered closed and resources are released. If a client reconnects while an old connection still exists, the server drops the old one to ensure a single active connection per device.

Using Long Connections for Push

When the channel service receives a final push request, it packages the payload with the dt and routes it to the appropriate self‑built channel instance. If the device is offline, the push is not delivered.

Security of Self‑Built Long Connections

Security is ensured by using TLS‑encrypted server domains and a custom codec for data encoding/decoding between server and client SDKs.

4. Push Receipt Handling

The slark service normalizes receipt messages from various vendor and self‑built channels (Huawei, Meizu, Xiaomi, OPPO, etc.). It processes return codes to clean invalid dt entries and records client‑open events for analytics.

Platform Data Statistics

1. Message‑Level Data

Each push generates a msgId that travels through the entire pipeline, enabling traceability for troubleshooting and statistical analysis. Statuses include processing, sending, delivery, and opening, with processing and sending produced by channel nodes (Kafka), and delivery/opening reported by SDKs and fed back to slark.

The platform provides multi‑dimensional statistics: per‑device, per‑user push records, aggregate counts for group and broadcast pushes, and metrics such as reach and open rates.

Additionally, the platform shows channel‑level push status and recent trend charts for comprehensive monitoring.

2. Device‑Level Data

The platform records detailed dt‑pin mappings and aggregates statistics such as total devices per app, daily new devices, cleaned devices, and offline devices, as well as breakdowns by Android brand and iOS model.

3. Implementation of Statistics

Statistics are computed using Flink jobs that process the full message flow, leveraging msgId, message type, push status, and device type fields. Device‑level data is persisted in Elasticsearch, enabling pin‑based queries of push outcomes.

Monitoring and Alerting

The platform embeds extensive monitoring points for both internal system health and user‑facing services. Interface call volume, performance, and availability trigger alerts to application owners. Cluster nodes are monitored for memory and CPU usage. Daily certificate checks alert owners of impending expirations.

Future Plans

Current services cover JD Finance, JD Mai, JD Xi, with the next goal to support JD.com’s main site. Ongoing work includes performance optimization, stability enhancements, and expanding the platform to external customers through cloud‑native, componentized deployments.

monitoringbackend architecturepush notificationsdata analyticsscalable systemsLong Connections
JD Retail Technology
Written by

JD Retail Technology

Official platform of JD Retail Technology, delivering insightful R&D news and a deep look into the lives and work of technologists.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.