Operations 26 min read

Ctrip's Weak Network Identification Model: Design, Implementation, and Practice

This article details Ctrip's approach to weak network detection, covering background, data collection, processing, dynamic weighting algorithms, result output, deployment effects, and future plans, and provides practical code examples and threshold settings for improving mobile network performance.

High Availability Architecture

Nov 4, 2024

Ctrip's Weak Network Identification Model: Design, Implementation, and Practice

Since Ctrip launched its "Wireless Strategy" in 2010, the mobile R&D team has continuously optimized client‑side network performance. After years of improvement, the overall network quality is stable, but isolated weak‑network cases still cause user complaints, prompting the need for a dedicated weak‑network identification model.

The model consists of three stages: data collection, data processing, and result output. The following sections describe each stage in detail.

1. Background

Weak‑network detection is the first step in network optimization. Ctrip aims to accurately recognize weak‑network scenarios to improve user experience and reduce complaints.

2. Technical Solution

2.1 Data Collection

Key network quality indicators include HttpRTT, TransportRTT, Throughput, BandwidthDelayProduct, SignalStrength, and NetworkSuccessRate. After evaluating relevance and feasibility, Ctrip selected HttpRTT, TransportRTT, and NetworkSuccessRate as model inputs.

Data is collected from three channel types: TCP proxy, QUIC proxy, and standard HTTP. The collection rules are:

TransportRTT: channel heartbeat time, TCP connection time, HTTP connection time (excluding TLS handshake for QUIC).

HttpRTT: response header receipt time minus request header send time (or equivalent for custom channels).

NetworkSuccessRate: success flags of TCP/QUIC connections, heartbeat, and complete HTTP responses.

For iOS (NSURLSession) the metric can be obtained via URLSession:task:didFinishCollectingMetrics::

TransportRTT = connectEnd - connectStart - secureConnectionEnd + secureConnectionStart;
HttpRTT = responseStart - requestStart;
NetworkSuccessStatus = responseEnd && no transmission error;

For Android (OkHttp) the metric is captured through an EventListener:

TransportRTT = connectEnd - connectStart - secureConnectEnd + secureConnectStart;
HttpRTT = responseHeadersStart - requestHeadersStart;
NetworkSuccessStatus = responseBodyEnd && no transmission error;

The collected data is wrapped in a C++ struct for injection into the model:

typedef enum : int64_t {
    NQEMetricsSourceTypeInvalid = 0,
    NQEMetricsSourceTypeTcpConnect = 1 << 0,
    NQEMetricsSourceTypeQuicConnect = 1 << 1,
    NQEMetricsSourceTypeHttpRequest = 1 << 2,
    NQEMetricsSourceTypeQuicRequest = 1 << 3,
    NQEMetricsSourceTypeHeartBeat = 1 << 4,
    ...
} NQEMetricsSourceType;

struct NQEMetrics {
    NQEMetricsSourceType source; // bitmask of sources
    bool isSuccessed;            // success flag for success‑rate calculation
    double httpRTTInSec;         // optional
    double transportRTTInSec;    // optional
    double occurrenceTimeInSec;  // timestamp
};

2.2 Data Processing

Collected metrics are placed into a sliding‑window queue. Filtering rules discard invalid entries, enforce a minimum RTT of 10 ms and a maximum of 5 min, and require at least five samples for a stable window.

Two dynamic weighting schemes are used to give recent data higher influence:

Half‑life weighting : weight decays by a configurable factor every fixed period (e.g., 0.5 decay every 60 s).

Arctangent weighting : weight = (π/2 − atan(|now − t| × rate)) / (π/2), where rate controls the decay speed.

Example half‑life implementation (Google NQE style):

double GetWeightMultiplierPerSecond(const std::map<std::string, std::string>& params) {
  int half_life_seconds = 60;
  auto it = params.find("HalfLifeSeconds");
  if (it != params.end() && base::StringToInt(it->second, &half_life_seconds) && half_life_seconds >= 1) {
    // use custom half‑life
  }
  DCHECK_GT(half_life_seconds, 0);
  return pow(0.5, 1.0 / half_life_seconds);
}

Weighted median is used for RTT aggregation to reduce sensitivity to outliers, while weighted average is used for success‑rate:

double totalSuccessRate = 0.0;
for (const auto& m : metrics) {
  totalSuccessRate += (m.isSuccessed ? 1 : 0) * m.weight;
}
return totalSuccessRate / totalWeights;

A "success‑rate trend" metric tracks continuous upward or downward changes to improve real‑time responsiveness. The trend is updated as follows:

void NQE::_updateSuccessRateTrend() {
  double diff = newRate - oldRate;
  if (abs(diff) > 1) { _successRateContinuousDiff = 0; return; }
  if (abs(diff) < 0.01) { _successRateContinuousDiff += diff; return; }
  if ((diff > 0 && _successRateContinuousDiff > 0) ||
      (diff < 0 && _successRateContinuousDiff < 0)) {
    _successRateContinuousDiff += diff;
  } else {
    _successRateContinuousDiff = diff;
  }
}

2.3 Result Output

The model outputs a simple enumeration that is easy for developers to interpret:

typedef enum : int64_t {
    NetworkQualityTypeUnknown = 0,
    NetworkQualityTypeOffline = 1,
    NetworkQualityTypeBad = 2,
    NetworkQualityTypeGood = 3
} NetworkQualityType;

Decision rules:

Bad is triggered when either TransportRTT or HttpRTT exceeds its weak‑network threshold, or when NetworkSuccessRate falls below 90 % and SuccessRateTrend is also low.

Good** is any state that does not meet the Bad or Offline conditions.

Thresholds (final tuned values achieving >90 % accuracy):

HttpRTT > 1220 ms

TransportRTT > 520 ms

NetworkSuccessRate < 90 %

SuccessRateTrend < 0.2

The model runs on iOS, Android, and HarmonyOS using a shared C++ core, allowing consistent metrics across platforms.

3. Deployment Effects

Since integration, the model has been deployed at large scale, feeding network‑quality data into Ctrip’s APM platform. It supports offline detection, real‑time quality switches, and has contributed to measurable performance improvements across multiple business lines.

4. Future Outlook

Extend the model to all Ctrip client applications and platforms.

Continuously monitor and prevent model degradation, maintaining high accuracy and low latency.

Publish an internal "Network Performance Whitepaper" covering end‑to‑end latency, success‑rate, and quality baselines.

Focus on weak‑network optimization for overseas markets where network conditions are more challenging.

Reference images and diagrams from the original article are retained below:

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data collection network optimization mobile performance dynamic weighting Weak Network Detection

Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.