Ctrip's Weak Network Identification Model: Design, Implementation, and Practice
This article details Ctrip's approach to weak network detection, covering background, data collection, processing, dynamic weighting algorithms, result output, deployment effects, and future plans, and provides practical code examples and threshold settings for improving mobile network performance.
Since Ctrip launched its "Wireless Strategy" in 2010, the mobile R&D team has continuously optimized client‑side network performance. After years of improvement, the overall network quality is stable, but isolated weak‑network cases still cause user complaints, prompting the need for a dedicated weak‑network identification model.
The model consists of three stages: data collection, data processing, and result output. The following sections describe each stage in detail.
1. Background
Weak‑network detection is the first step in network optimization. Ctrip aims to accurately recognize weak‑network scenarios to improve user experience and reduce complaints.
2. Technical Solution
2.1 Data Collection
Key network quality indicators include HttpRTT, TransportRTT, Throughput, BandwidthDelayProduct, SignalStrength, and NetworkSuccessRate. After evaluating relevance and feasibility, Ctrip selected HttpRTT, TransportRTT, and NetworkSuccessRate as model inputs.
Data is collected from three channel types: TCP proxy, QUIC proxy, and standard HTTP. The collection rules are:
TransportRTT: channel heartbeat time, TCP connection time, HTTP connection time (excluding TLS handshake for QUIC).
HttpRTT: response header receipt time minus request header send time (or equivalent for custom channels).
NetworkSuccessRate: success flags of TCP/QUIC connections, heartbeat, and complete HTTP responses.
For iOS (NSURLSession) the metric can be obtained via URLSession:task:didFinishCollectingMetrics: :
TransportRTT = connectEnd - connectStart - secureConnectionEnd + secureConnectionStart;
HttpRTT = responseStart - requestStart;
NetworkSuccessStatus = responseEnd && no transmission error;For Android (OkHttp) the metric is captured through an EventListener :
TransportRTT = connectEnd - connectStart - secureConnectEnd + secureConnectStart;
HttpRTT = responseHeadersStart - requestHeadersStart;
NetworkSuccessStatus = responseBodyEnd && no transmission error;The collected data is wrapped in a C++ struct for injection into the model:
typedef enum : int64_t {
NQEMetricsSourceTypeInvalid = 0,
NQEMetricsSourceTypeTcpConnect = 1 << 0,
NQEMetricsSourceTypeQuicConnect = 1 << 1,
NQEMetricsSourceTypeHttpRequest = 1 << 2,
NQEMetricsSourceTypeQuicRequest = 1 << 3,
NQEMetricsSourceTypeHeartBeat = 1 << 4,
...
} NQEMetricsSourceType;
struct NQEMetrics {
NQEMetricsSourceType source; // bitmask of sources
bool isSuccessed; // success flag for success‑rate calculation
double httpRTTInSec; // optional
double transportRTTInSec; // optional
double occurrenceTimeInSec; // timestamp
};2.2 Data Processing
Collected metrics are placed into a sliding‑window queue. Filtering rules discard invalid entries, enforce a minimum RTT of 10 ms and a maximum of 5 min, and require at least five samples for a stable window.
Two dynamic weighting schemes are used to give recent data higher influence:
Half‑life weighting : weight decays by a configurable factor every fixed period (e.g., 0.5 decay every 60 s).
Arctangent weighting : weight = (π/2 − atan(|now − t| × rate)) / (π/2), where rate controls the decay speed.
Example half‑life implementation (Google NQE style):
double GetWeightMultiplierPerSecond(const std::map
& params) {
int half_life_seconds = 60;
auto it = params.find("HalfLifeSeconds");
if (it != params.end() && base::StringToInt(it->second, &half_life_seconds) && half_life_seconds >= 1) {
// use custom half‑life
}
DCHECK_GT(half_life_seconds, 0);
return pow(0.5, 1.0 / half_life_seconds);
}Weighted median is used for RTT aggregation to reduce sensitivity to outliers, while weighted average is used for success‑rate:
double totalSuccessRate = 0.0;
for (const auto& m : metrics) {
totalSuccessRate += (m.isSuccessed ? 1 : 0) * m.weight;
}
return totalSuccessRate / totalWeights;A "success‑rate trend" metric tracks continuous upward or downward changes to improve real‑time responsiveness. The trend is updated as follows:
void NQE::_updateSuccessRateTrend() {
double diff = newRate - oldRate;
if (abs(diff) > 1) { _successRateContinuousDiff = 0; return; }
if (abs(diff) < 0.01) { _successRateContinuousDiff += diff; return; }
if ((diff > 0 && _successRateContinuousDiff > 0) ||
(diff < 0 && _successRateContinuousDiff < 0)) {
_successRateContinuousDiff += diff;
} else {
_successRateContinuousDiff = diff;
}
}2.3 Result Output
The model outputs a simple enumeration that is easy for developers to interpret:
typedef enum : int64_t {
NetworkQualityTypeUnknown = 0,
NetworkQualityTypeOffline = 1,
NetworkQualityTypeBad = 2,
NetworkQualityTypeGood = 3
} NetworkQualityType;Decision rules:
Bad is triggered when either TransportRTT or HttpRTT exceeds its weak‑network threshold, or when NetworkSuccessRate falls below 90 % and SuccessRateTrend is also low.
Good** is any state that does not meet the Bad or Offline conditions.
Thresholds (final tuned values achieving >90 % accuracy):
HttpRTT > 1220 ms
TransportRTT > 520 ms
NetworkSuccessRate < 90 %
SuccessRateTrend < 0.2
The model runs on iOS, Android, and HarmonyOS using a shared C++ core, allowing consistent metrics across platforms.
3. Deployment Effects
Since integration, the model has been deployed at large scale, feeding network‑quality data into Ctrip’s APM platform. It supports offline detection, real‑time quality switches, and has contributed to measurable performance improvements across multiple business lines.
4. Future Outlook
Extend the model to all Ctrip client applications and platforms.
Continuously monitor and prevent model degradation, maintaining high accuracy and low latency.
Publish an internal "Network Performance Whitepaper" covering end‑to‑end latency, success‑rate, and quality baselines.
Focus on weak‑network optimization for overseas markets where network conditions are more challenging.
Reference images and diagrams from the original article are retained below:
High Availability Architecture
Official account for High Availability Architecture.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.