How Ctrip Boosted Mobile App Network Performance: Real-World Practices and Lessons
Ctrip's wireless development team shares a comprehensive overview of their app's network architecture, common performance pitfalls such as DNS and TCP issues, and a series of practical optimizations—including DNS caching, quality detection, priority handling, retransmission, payload reduction, and protocol upgrades—that dramatically improved service success rates and reduced latency.
导语
Ctrip's wireless development director Chen Haoran summarizes practical experiences in optimizing the network performance of the Ctrip app, emphasizing that network service performance is the most critical part of any app optimization effort.
Native Network Services
The core business modules (hotels, flights, train tickets, guides, etc.) use TCP connections rather than typical RESTful HTTP APIs; only a few lightweight services use HTTP.
TCP services employ a mix of long‑connection pools and short connections: long connections are kept in a pool to avoid repeated handshakes, while short connections are closed after each request.
TCP payloads use a custom serialization protocol, whereas HTTP payloads are simple JSON.
Hybrid Network Services
Hybrid modules run in WebViews and issue HTTP requests through the system WebView. A small number of scenarios (e.g., encryption, payment) use a native TCP channel via a hybrid bridge.
All network services—both TCP and HTTP—first connect to an API Gateway. TCP requests are forwarded by a TCP Gateway to backend SOA services; HTTP requests follow a similar path. Gateways also provide flow control and circuit breaking.
Below is the deployment architecture diagram:
Typical network service steps include DNS lookup, TCP handshake, TLS handshake (if applicable), and TCP/HTTP request‑response cycles.
Round‑Trip Time (RTT) is a key metric; for example, 4G RTT ≈ 100 ms, 3G RTT ≈ 200 ms. These values set a lower bound for overall service latency.
Common Network Performance Problems
DNS Issues – hijacking, failures, or slow resolution (especially on 2G/3G networks) can dramatically increase first‑request latency.
TCP Connection Issues – port blocking, inappropriate timeout settings, or overly long/short timeouts cause failures or poor user experience.
Read/Write Timeouts – improper read/write timeout values lead to failures on slow networks.
Payload Size – oversized payloads increase transmission time.
Complex Domestic/International Network Conditions – ISP interconnect limitations and low‑bandwidth overseas links.
Network type distribution for Ctrip app users shows Wi‑Fi > 60%, 4G approaching 3G, and a declining 2G share.
Bandwidth and latency are not directly correlated; high bandwidth does not guarantee low latency.
Optimization Practices
Practice 1: DNS Resolution and Caching – Maintain a weighted Server IP list updated from DNS; prioritize the highest‑weight IPs and adjust weights based on connection success.
Practice 2: Network Quality Detection – Adjust timeout parameters and concurrent connection limits based on detected network type (2G/3G/4G/Wi‑Fi).
Practice 3: Service Priority and Dependency – Assign priorities to services; high‑priority services use long connections, low‑priority use short connections. Implement dependency chains so child services are only invoked after parent success.
Practice 4: Automatic Retransmission – Retry failed connections, writes, or reads; fallback from long to short connections when needed. Provide a switch to disable retransmission for non‑idempotent operations such as order placement.
Practice 5: Reduce Payload Size – Switch TCP payloads to Protocol Buffers with Gzip compression, cutting payload size by 15‑45 % and serialization time by 80‑90 %.
Practice 6: Overseas Performance – Use CDN acceleration and separate static/dynamic resources to improve hybrid module performance abroad.
From version V5.9 to V6.4, core service success rates exceed 99 %, overall latency dropped 150‑200 ms, and payload size reductions are evident.
Robust logging and monitoring (client instrumentation, server‑side processing, dashboards, alerts) proved essential; Ctrip built a real‑time network monitoring portal on ElasticSearch, feeding data into Hadoop/Hive for KPI analysis across dimensions such as network type, city, and connection mode.
Emerging Protocols
Google's SPDY (now HTTP/2) offers multiplexing, priority, server push, header compression, and mandatory TLS. Major Chinese apps have begun trials, showing up to 30 % latency reduction.
QUIC, built on UDP, eliminates the TCP handshake latency (0‑RTT connection) and solves head‑of‑line blocking, offering better congestion control for mobile networks and reducing the impact of connection migration.
These new protocols are expected to further improve mobile network performance.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
21CTO
21CTO (21CTO.com) offers developers community, training, and services, making it your go‑to learning and service platform.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
