Mobile Development 10 min read

Design and Optimization of iQIYI Mobile APM Network Monitoring System

This article details the background, system design, and successive optimizations of iQIYI's mobile Application Performance Monitoring (APM) network monitoring solution, covering SDK and backend architecture, DNS and weak‑network enhancements, gateway strategies, retry mechanisms, and the resulting significant reduction in error rates.

High Availability Architecture
High Availability Architecture
High Availability Architecture
Design and Optimization of iQIYI Mobile APM Network Monitoring System

Enterprises need to monitor the quality and performance of their online services from the code side, leading to the emergence of Application Performance Monitoring (APM) systems, which are crucial infrastructure for detecting and solving production issues.

To ensure efficient operation of IT support systems, APM must establish a robust IT operations management framework that continuously monitors component performance, analyzes anomalies in real time, and quickly resolves online application problems, with network quality being a fundamental metric.

Before iQIYI's mobile APM launch, network monitoring focused only on backend services; however, due to varied mobile device performance and network conditions, a user‑centric, real‑time, multi‑dimensional network monitoring system was built, including error‑rate, hijack‑rate, and performance evaluation standards.

The system design includes:

Classification of network errors into network‑layer errors, HTTP response errors, and parsing errors.

SDK design that balances sampling impact on device performance with data accuracy.

Backend design supporting second‑level real‑time, massive data storage, flexible queries, and minute‑level multi‑dimensional alerts.

Optimizations introduced comprise DNS improvements (three‑layer cache, HTTPDNS), weak‑network modeling and mitigation (Brotli compression, reduced concurrency, priority handling, data size reduction), a gateway solution that shares long‑lived connections and avoids DNS hijacking, and a low‑cost “super pipeline” proxy for cross‑region failover.

Additional enhancements include rational retry strategies (original, HTTPS downgrade, HTTP/2 downgrade, IP direct, super pipeline), connection optimizations (TCP racing, TLS 1.3, pre‑connect, connection reuse), and continuous monitoring of error‑rate trends, which have dropped from over 5% to below 0.5% across Android and iOS.

Since deployment, the system has been adopted across iQIYI's apps, significantly reducing network error rates and establishing a comprehensive network performance monitoring framework for future improvements such as full‑link tracing and QUIC integration.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Mobile DevelopmentPerformance OptimizationAPMiQIYINetwork Monitoring
High Availability Architecture
Written by

High Availability Architecture

Official account for High Availability Architecture.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.