How Meituan’s Phoenix SDK Enables Client‑Side CDN Disaster Recovery

This article explains Meituan's Phoenix solution that moves CDN disaster recovery to the client side, detailing its goals, architecture, dynamic calculation service, monitoring platform, implementation for web and native apps, and the measurable improvements in availability and operational efficiency.

21CTO
21CTO
21CTO
How Meituan’s Phoenix SDK Enables Client‑Side CDN Disaster Recovery

1. Introduction

CDN has become essential infrastructure for the Internet, and many services rely on it; its stability directly affects business availability. Meituan's SRE team traditionally handles CDN disaster recovery, but client‑side solutions were lacking.

2. Background

CDN accelerates static resources such as JS, CSS, images, video, and audio, but CDN failures can cause page white‑screens, layout errors, and image loading failures. Monitoring CDN from the SRE side is difficult due to the wide distribution of edge nodes, and small‑traffic or regional issues are often hidden in aggregated dashboards.

3. Goals and Scenarios

3.1 Core Goals

Client‑side CDN domain auto‑switch : Detect CDN problems instantly on the client and retry with alternative domains without manual intervention.

Domain isolation : Ensure equivalent CDN domains are isolated by region while providing the same service.

Precise CDN monitoring : Build fine‑grained monitoring per project to reduce alert latency and adjust disaster‑recovery strategies dynamically.

Continuous hot‑standby : Keep each CDN domain warm to avoid back‑origin traffic spikes during switches.

3.2 Applicable Scenarios

All client‑side contexts that depend on CDN—Web, SSR Web, and Native—can benefit from this approach.

4. Phoenix Solution

The Phoenix client‑side CDN disaster‑recovery scheme consists of five parts: a client‑side SDK, a dynamic calculation service, a monitoring platform, CDN services with isolated domains, and a configuration platform.

4.1 Overall Design

The SDK senses resource loading results, performs automatic CDN domain switching, and reports metrics. The dynamic calculation service periodically polls SDK reports, evaluates domain availability per city, project, and time slice, and reorders domains to direct traffic to the most reliable CDN. The monitoring platform visualizes CDN health at project, region, and ISP levels.

4.2 Disaster‑Recovery Flow

When a resource fails to load from a primary CDN domain, the SDK retries using a list of backup domains until success or exhaustion, reducing reliance on manual SRE switches.

4.3 Implementation Details

4.3.1 Client‑Side SDK (Web)

Static resources are loaded via XHR instead of traditional tags, allowing status‑code based success detection. Webpack extracts synchronous resources and loads them through a custom PhoenixLoader, while asynchronous resources are intercepted and re‑routed similarly. The SDK is packaged as a Webpack plugin to ensure broad compatibility, ease of integration, stability, and low intrusiveness.

4.3.2 Dynamic Calculation Service

The service links domain pools with project AppKeys, aggregates loading results within five‑minute windows, and computes availability per city and province. It then redistributes traffic based on success‑rate differentials (e.g., transferring a portion of traffic from lower‑success domains to higher‑success ones) to achieve a smooth, optimal domain ordering.

4.3.3 Monitoring

Metrics are collected by project, app, resource, and domain, forming a CDN‑availability dashboard that provides minute‑level alerts and detailed diagnostics (region, ISP, response code). This granularity enables faster detection of localized CDN issues compared to traditional SRE dashboards.

4.3.4 CDN Service Enhancements

CDN services now support domain isolation and provide equivalent domains (e.g., cdn1.meituan.net and cdn2.meituan.net) that return identical content, ensuring that client‑side switches remain effective without causing back‑origin overload.

5. Results and Outlook

After one year, Phoenix handles over 30 million daily disaster‑recovery requests, saving more than 350 k users across Meituan’s food‑delivery, travel, and other businesses, and is integrated into over 200 projects including the Meituan and Dianping apps. The solution provides minute‑level, project‑specific alerts, dramatically reducing manual SRE interventions and improving overall CDN availability.

Future work includes expanding SDK compatibility with more front‑end frameworks, open‑sourcing the dynamic calculation service, and enhancing resource verification, intelligent switching, and performance optimizations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

CDNdisaster recoveryclient-sideMeituanPhoenix SDK
21CTO
Written by

21CTO

21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.