How to Build a Passive Weak‑Network Diagnosis System for Mobile Apps
This article details the design and implementation of a passive weak‑network diagnosis framework for a mobile app, covering background, overall architecture, data collection of HttpRTT and throughput, filtering strategies, weighted median calculations, caching, interface design, threshold definitions, performance metrics, and practical application scenarios.
Background
As the user base and business complexity of the app grew, network experience optimization entered a deep‑water stage, especially for users on weak networks. To ensure a smooth experience, a passive weak‑network diagnosis capability was built on top of existing network dashboards and diagnostic tools.
Overall Architecture
The system consists of a collection layer, a strategy layer, an interface layer, and supporting modules for threshold definition and performance metrics.
Collection Layer
HttpRTT Collection
HttpRTT is measured as the time from the first byte of the request header being sent to the first byte of the response header being received. On Android, OkHttp is used with a network listener; on other clients, custom HTTP stack listeners are registered. The measurement excludes server processing time when possible.
Sample Filtering for HttpRTT
Requests to localhost with cached responses are ignored.
PUT requests, which usually carry large payloads, are excluded.
Other outliers are filtered out as needed.
Throughput Collection
Throughput (bits/s) is defined as the amount of data transferred per unit time. Passive collection requires detecting periods when the device is transmitting large amounts of data. The system monitors concurrent HTTP requests and opens a time window when at least five requests are in flight. The window closes as soon as any request finishes, ensuring minimal missed sampling opportunities.
Time Window Selection
The window must capture the peak of data transfer; opening it too early or closing it too late skews the result. The implementation records the total bytes transferred at window start and end, then computes the throughput for that interval.
Dirty Data Filtering
Samples with transferred data < 32 KB are discarded.
Samples with low utilization (e.g., total transferred data < 15 KB) are ignored.
fun isHangingWindow(bitsRx: Long, duration: Long): Boolean {
val kCwndSizeBits = 10 * 1.5 * 1000 * 8
val multiplier = 1
val httpRTT = ??? // calculated by HttpRTT module
val bitsReceivedOverOneHttpRtt = bitsRx * httpRTT / duration
return bitsReceivedOverOneHttpRtt < kCwndSizeBits * multiplier
}Strategy Layer
Median
Using the median rather than the mean reduces the impact of extreme values and skewed distributions.
Time Weight
Samples are weighted by recency: every 60 seconds the weight halves, so newer samples have higher influence.
Signal Strength Weight
Signal strength is divided into four levels; samples whose signal level matches the current level receive higher weight. Currently applied only on Android Wi‑Fi.
Weighted Median
The final HttpRTT and throughput values are computed as a weighted median, combining time and signal weights.
Calculation and Caching
Re‑computing on every API call is too costly (1–10 ms per calculation). The system only recomputes when one of the following occurs:
No previous result.
Network type changes.
Signal strength changes.
More than 10 seconds have passed since the last calculation.
HttpRTT sample count increases by >50%.
Throughput sample count increases by >50%.
Total sample count increases by >20.
When recomputed, the result is cached for subsequent calls.
Interface Layer
The computed HttpRTT and throughput are mapped to five network quality types (e.g., TYPE_2G, TYPE_3G, TYPE_4G, etc.) and exposed via an API. The mapping is based on thresholds such as 272 ms, 511 ms for RTT and 400 kbps, 1600 kbps for throughput.
Threshold Definition
Thresholds were derived from offline testing (simulated weak‑network conditions) and online validation (monitoring real traffic). The weak‑network rate observed was about 0.65% of users, with less than 5% of requests in that group being faster than the 50th percentile.
Performance Metrics
HttpRTT Calculation
nqeHttpRTT is the final estimated RTT after filtering and weighting. It remains stable even when individual request RTTs fluctuate.
Throughput Calculation
Throughput is reported in kbps; conversion to KB/s requires division by 8.
Accuracy Determination
Weak‑network is defined as requests whose total latency exceeds twice the 50th‑percentile network propagation time. Accuracy is measured by comparing the diagnosed type with actual request latency.
Application Scenarios
Parallel HTTP/2 connections to mitigate head‑of‑line blocking on weak networks.
Full‑site acceleration to reduce RTT by connecting to edge nodes.
Adaptive video quality reduction.
Dynamic preload throttling based on network quality.
Summary
By passively listening to all HTTP requests in the app, filtering samples, and applying a weighted‑median strategy that accounts for recency and signal strength, the system reliably estimates HttpRTT and throughput, classifies network quality, and provides actionable insights for optimization across multiple scenarios.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Architect
Professional architect sharing high‑quality architecture insights. Topics include high‑availability, high‑performance, high‑stability architectures, big data, machine learning, Java, system and distributed architecture, AI, and practical large‑scale architecture case studies. Open to ideas‑driven architects who enjoy sharing and learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
