Ctrip App Network Service Architecture and Performance Optimization Practices
The article details Ctrip's mobile app network service architecture, explaining native TCP and hybrid HTTP communication, describing common performance issues such as DNS, TCP, payload size, and presents six optimization practices—including DNS caching, quality detection, priority handling, retransmission, payload reduction, and overseas network improvements—to achieve higher success rates and lower latency.
First, the Ctrip App network service architecture is introduced. Because many business modules cannot be fully implemented with native code, a large portion of channels are built on Hybrid, while core business modules (hotel, flight, train tickets, guides, etc.) use native code.
Native modules communicate via TCP connections, using a long‑connection pool plus short‑connection fallback. Payloads use a custom serialization protocol, while HTTP services use simple JSON.
Hybrid modules run in WebView and use the system's HTTP requests; a few scenarios (encryption, payment) use a native TCP bridge.
All network services, whether TCP or HTTP, first connect to an API Gateway. TCP services go through a TCP Gateway that forwards requests to backend SOA services via HTTP; the HTTP Gateway works similarly. Gateways also handle traffic control and circuit breaking.
Typical network steps include DNS lookup, TCP handshake, TLS handshake (if any), and TCP/HTTP request‑response. RTT (Round‑Trip Time) is a key performance metric; ideal RTT is ~100 ms on 4G and ~200 ms on 3G.
Common performance problems are identified:
DNS issues (hijacking, slow resolution, failures)
TCP connection problems (port blocking, timeout settings)
Read/Write timeout problems
Connection migration when network type changes
Oversized payloads
Complex domestic and overseas network conditions
Optimization practices include:
Optimizing DNS resolution and caching by maintaining a weighted Server IP list that updates dynamically.
Network quality detection to adjust timeout parameters and concurrency based on 2G/3G/4G/Wi‑Fi conditions.
Providing network‑service priority and dependency mechanisms to prefer long connections for high‑priority requests and to cancel dependent services on failure.
Implementing automatic retransmission for failed connections, writes, or reads, with safeguards for non‑idempotent operations.
Reducing data transmission by switching to Protocol Buffers with Gzip compression, and adopting efficient image formats such as WebP.
Improving overseas performance through CDN acceleration and static‑dynamic resource separation.
After applying these measures, the Ctrip App’s network success rate rose above 99 % for core services, average latency dropped 150‑200 ms, and payload size decreased 15‑45 % with serialization time cut by 80‑90 %.
Comprehensive logging and real‑time monitoring built on ElasticSearch, Hadoop, and Hive enable multi‑dimensional KPI analysis (success rate, latency, connection metrics) across cities, network types, and connection modes.
Future directions point to newer protocols such as SPDY/HTTP‑2 and QUIC, which promise further latency reductions and better handling of connection migration.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Art of Distributed System Architecture Design
Introductions to large-scale distributed system architectures; insights and knowledge sharing on large-scale internet system architecture; front-end web architecture overviews; practical tips and experiences with PHP, JavaScript, Erlang, C/C++ and other languages in large-scale internet system development.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
