Operations 25 min read

Ctrip's Weak Network Detection Model: Design, Implementation, and Evaluation

This article details Ctrip's end‑to‑end weak‑network identification model, covering background, metric selection, data collection on iOS and Android, processing pipelines with dynamic weighting, weighted median calculations, success‑rate trends, threshold tuning, and deployment results across multiple platforms.

Ctrip Technology

Nov 1, 2024

Ctrip's Weak Network Detection Model: Design, Implementation, and Evaluation

Since Ctrip launched its "Wireless Strategy" in 2010, the mobile R&D team has continuously optimized client‑side network performance, eventually reaching a stable overall level and moving into the "deep water" stage where further improvements are challenging.

Analysis of user complaints revealed many cases of poor network experience despite good aggregate metrics, which were attributed to "weak network" tail data. Building an accurate weak‑network identification model became the first step toward targeted optimization.

Technical Solution

The model consists of three stages: data collection, data processing, and result output.

2.1 Data Collection

Key network quality indicators include HttpRTT, TransportRTT, and NetworkSuccessRate. HttpRTT and TransportRTT are collected from both standard HTTP requests and custom TCP/QUIC proxy channels, while NetworkSuccessRate captures successful connection and request outcomes.

For iOS, NSURLSessionTaskDelegate provides metric information; for Android, OkHttp EventListener records timestamps. The collected data are encapsulated in the following C++ structures:

typedef enum : int64_t {    
    NQEMetricsSourceTypeInvalid = 0,
    NQEMetricsSourceTypeTcpConnect = 1 << 0,
    NQEMetricsSourceTypeQuicConnect = 1 << 1,
    NQEMetricsSourceTypeHttpRequest = 1 << 2,
    NQEMetricsSourceTypeQuicRequest = 1 << 3,
    NQEMetricsSourceTypeHeartBeat = 1 << 4,
    ......
} NQEMetricsSourceType;

struct NQEMetrics {
    NQEMetricsSourceType source;
    bool isSuccessed;
    double httpRTTInSec;
    double transportRTTInSec;
    double occurrenceTimeInSec;
};

2.2 Data Processing

Collected metrics are filtered (e.g., require at least one RTT value, RTT > 10 ms and < 5 min) and stored in a sliding window queue with a minimum of 5 samples and a maximum age of 5 minutes.

Dynamic weighting is applied to give recent data higher influence. Two weighting schemes are supported:

Half‑life weighting: weight decays by half every configurable period (e.g., 60 s). Sample implementation:

double GetWeightMultiplierPerSecond(const std::map<std::string, std::string>& params) {
  int half_life_seconds = 60;
  auto it = params.find("HalfLifeSeconds");
  if (it != params.end() && base::StringToInt(it->second, &variations_value) && variations_value >= 1) {
    half_life_seconds = variations_value;
  }
  DCHECK_GT(half_life_seconds, 0);
  return pow(0.5, 1.0 / half_life_seconds);
}

void ObservationBuffer::ComputeWeightedObservations(const base::TimeTicks& begin_timestamp,
    int32_t current_signal_strength,
    std::vector<WeightedObservation>* weighted_observations,
    double* total_weight) const {
  base::TimeDelta time_since_sample_taken = now - observation.timestamp();
  double time_weight = pow(weight_multiplier_per_second_, time_since_sample_taken.InSeconds());
  …
}

Arctangent weighting: weight = 1 – atan(delta × rate)/π/2, where the rate controls decay speed. Sample implementation:

static double _nqe_getWeight(double targetTime) {
    double interval = now - targetTime;
    double rate = 20.0 / 1; // smaller rate → slower decay
    return 1.0 - atan(interval * rate) / M_PI_2;
}

Weighted median is used instead of weighted average to reduce the impact of outliers. The algorithm sorts observations by value and accumulates weights until reaching half of the total weight.

std::optional<int32_t> ObservationBuffer::GetPercentile(
    base::TimeTicks begin_timestamp,
    int32_t current_signal_strength,
    int percentile,
    size_t* observations_count) const {
  double desired_weight = percentile / 100.0 * total_weight;
  double cumulative_weight_seen_so_far = 0.0;
  for (const auto& weighted_observation : weighted_observations) {
    cumulative_weight_seen_so_far += weighted_observation.weight;
    if (cumulative_weight_seen_so_far >= desired_weight)
      return weighted_observation.value;
  }
  return weighted_observations.back().value;
}

Success‑rate is computed as the weighted sum of successful flags divided by total weight, with a minimum window size check.

extern double _calculateSuccessRateByWeight(const vector<CTNQEMetrics>& metrics, uint64_t types, const shared_ptr<NQEConfig> config) {
    uint64_t totalValidCount = 0;
    double totalWeights = 0.0;
    double totalSuccessRate = 0.0;
    for (const auto& m : metrics) {
        if ((m.source & types) == 0) continue;
        totalValidCount++;
        totalWeights += m.weight;
        totalSuccessRate += (m.isSuccessed ? 1 : 0) * m.weight;
    }
    if (totalValidCount < config->minValidWindowSize || totalWeights <= 0.0) return NQE_INVALID_RATE_VALUE;
    return totalSuccessRate / totalWeights;
}

A "success‑rate trend" metric (range –1 to +1) is introduced to accelerate recovery detection when the network becomes healthy again.

void NQE::_updateSuccessRateTrend() {
    auto diff = newRate - oldRate;
    if (abs(diff) > 1) { _successRateContinuousDiff = 0; return; }
    if (abs(diff) < 0.01) { _successRateContinuousDiff += diff; return; }
    if (diff > 0 && _successRateContinuousDiff > 0) _successRateContinuousDiff += diff;
    else if (diff < 0 && _successRateContinuousDiff < 0) _successRateContinuousDiff += diff;
    else _successRateContinuousDiff = diff;
}

2.3 Result Output

The model outputs a simplified enum: Unknown, Offline, Bad (weak network), and Good. Bad is triggered when either HttpRTT or TransportRTT exceeds configured thresholds, or when both NetworkSuccessRate and SuccessRateTrend fall below their thresholds.

typedef enum : int64_t {
    NetworkQualityTypeUnknown = 0,
    NetworkQualityTypeOffline = 1,
    NetworkQualityTypeBad = 2,
    NetworkQualityTypeGood = 3
} NetworkQualityType;

Thresholds were initially based on Google NQE values (e.g., HttpRTT > 1726 ms, TransportRTT > 1531 ms, NetworkSuccessRate < 90 %). After online tuning, Ctrip settled on HttpRTT > 1220 ms, TransportRTT > 520 ms, NetworkSuccessRate < 90 %, and SuccessRateTrend < 0.2, achieving over 90 % identification accuracy.

Deployment and Impact

The C++ implementation enables cross‑platform integration (iOS, Android, Harmony). The model is now deployed in production across Ctrip's apps, feeding network‑quality data into the company's APM system and guiding network‑related optimizations.

Future work includes expanding coverage to all Ctrip terminals, preventing model degradation, publishing an internal network‑performance whitepaper, and focusing on weak‑network optimization for overseas markets.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data processing C#network optimization mobile performance RTT Weak Network Detection

Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.