How Meituan’s Phoenix SDK Enables Automatic Client‑Side CDN Failover

Meituan’s Phoenix solution equips web and native clients with an automatic CDN failover SDK, dynamic domain selection, and fine‑grained monitoring, dramatically improving resource loading success rates, reducing SRE workload, and ensuring high availability across millions of daily users.

ITPUB
ITPUB
ITPUB
How Meituan’s Phoenix SDK Enables Automatic Client‑Side CDN Failover

Introduction

CDN has become critical infrastructure for modern web services, yet CDN outages still cause image loading failures, page white‑screens, and layout glitches. Traditional CDN disaster recovery is handled by SRE teams, which often cannot react quickly enough for localized failures. Meituan’s Phoenix project moves part of the disaster‑recovery logic to the client side.

Background

Meituan’s food‑delivery platform serves billions of requests daily, with more than 70% of static assets (JS, CSS, images, video) hosted on CDN. CDN incidents—such as edge‑node failures or domain bans—directly impact user experience and generate heavy SRE workload. Existing SRE‑centric monitoring aggregates data at a coarse granularity, masking small‑scale or regional issues.

Goals and Applicable Scenarios

Client‑side CDN domain auto‑switch : Detect CDN failure instantly on the client and retry with an alternate domain without manual intervention.

Domain isolation : Provide equivalent domains that can be switched transparently while preserving service semantics.

Fine‑grained CDN monitoring : Track CDN availability per project, region, and time slice, enabling dynamic adjustment of disaster‑recovery policies.

Continuous domain hot‑warm : Keep backup domains pre‑warmed to avoid back‑origin spikes during traffic shifts.

The solution targets any client‑side scenario that relies on CDN—Web, SSR Web, and Native applications.

Phoenix Architecture Overview

The Phoenix system consists of five major components:

Client‑side Disaster‑Recovery SDK : Senses resource load results, performs automatic CDN retries, and reports metrics.

Dynamic Calculation Service : Periodically evaluates CDN health per city, project, and time window, and reorders domain lists accordingly.

Disaster‑Recovery Monitoring Platform : Provides project‑level and global dashboards, minute‑level alerts, and detailed diagnostic data.

CDN Service : Supplies equivalent domains and enforces domain isolation at the network layer.

Disaster‑Recovery Configuration Platform : Manages domain configurations, reporting policies, and manual traffic interventions.

Figure 1: Phoenix overall architecture
Figure 1: Phoenix overall architecture

Client‑Side SDK Implementation

Web Side

The SDK focuses on JS, CSS, and image assets. It replaces traditional tag‑based loading with XHR requests wrapped by a custom PhoenixLoader Webpack plugin. This allows the SDK to inspect HTTP status codes and trigger retries on failure.

Design considerations include:

Generality : Must work across Meituan’s diverse front‑end stacks.

Ease of use : Low integration cost to encourage adoption.

Stability : Remain functional even when CDN availability fluctuates.

Non‑intrusiveness : Plug‑and‑play without breaking existing business logic.

For asynchronous chunks, the plugin rewrites Webpack’s async loading mechanism so that PhoenixLoader handles the request and fallback logic.

CSS assets are treated similarly by overriding mini-css-extract-plugin ’s async loader.

Figure 3: Web SDK loading flow
Figure 3: Web SDK loading flow

Native Side

Native clients (Android/iOS) primarily handle images, audio/video, and bundled resources. The SDK intercepts failed requests, then transparently re‑issues the request to a backup CDN domain. The process is invisible to the business layer: only one request is observed by the caller, and the final response is returned after a successful retry.

Key design points:

Convenience : Simple integration and clean removal.

Compatibility : Works with Retrofit, OkHttp3, URLConnection on Android and with NSURLProtocol on iOS.

Extensibility : Allows custom monitoring data to be reported.

Figure 8: Native SDK adaptor architecture
Figure 8: Native SDK adaptor architecture

Dynamic Calculation Service

The service links a domain pool with project AppKey and computes per‑region availability every five minutes using recent load‑result reports. It adjusts domain ordering based on success‑rate and traffic share, applying a transfer‑baseline algorithm to shift traffic toward higher‑success domains while preserving a small random portion for lower‑success domains.

Example: Domains A (99% success, 90% traffic), B (98% success, 6% traffic), C (97.8% success, 4% traffic) are re‑balanced to A 95%, B 3.5%, C 1.5% after calculation.

Figure 13: Dynamic calculation workflow
Figure 13: Dynamic calculation workflow

Monitoring Platform

Metrics are reported per project, app, resource, and domain. The platform aggregates data into a minute‑level CDN availability dashboard, exposing dimensions such as region, ISP, network condition, and HTTP status code. Alerts trigger within minutes of an anomaly, offering richer context than traditional SRE dashboards.

Figure 16: Monitoring data flow
Figure 16: Monitoring data flow

Results

After one year of deployment, Phoenix handles over 30 million daily disaster‑recovery requests, saving more than 350 k user sessions. Business success rates for image loading rose from 99.7% to 99.9% on average. In A/B tests, Android and iOS apps that enabled Phoenix showed a noticeable reduction in failure‑induced white‑screens and layout glitches.

Figure 9: Android success‑rate comparison
Figure 9: Android success‑rate comparison
Figure 10: iOS success‑rate comparison
Figure 10: iOS success‑rate comparison

Conclusion and Outlook

Phoenix has become Meituan’s sole public CDN disaster‑recovery service, supporting over 200 projects across food‑delivery, travel, and e‑commerce lines. It has dramatically reduced manual CDN switch‑overs, lowered SRE operational pressure, and improved end‑user experience. Future work includes expanding SDK coverage to more front‑end frameworks, enhancing resource‑signature verification, and open‑sourcing the dynamic calculation service to encourage broader adoption.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

frontendSDKCDNdisaster recoveryclient-sideMeituanPhoenix
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.