Design and Implementation of Ctrip's Real-Time User Data Collection System
This article details the design, technology selection, architecture, encryption, compression, and performance evaluation of Ctrip's real-time user data collection system, which leverages Java, Netty, Kafka, and Avro to achieve high throughput, low latency, and robust fault tolerance for mobile and web applications.
The author, Wang Xiaobo, a senior engineer at Ctrip's framework R&D department, introduces the real-time user data collection system he designed, which addresses the limitations of traditional PC‑based logging in the era of mobile internet.
The system is built on a Java stack, using Netty (a high‑performance NIO framework) for network communication and Hermes (a Kafka‑based distributed message queue) for storage, providing real‑time, high‑throughput, and universal data collection.
Key components include a client SDK that sends data via HTTP/TCP/UDP to a Mechanic (UBT‑Collector) server, which processes and forwards the data to Hermes/Kafka; monitoring data is stored in HBase and visualized via a Dashboard.
Netty was chosen after evaluating alternatives (Mini, xSocket) for its rich features, performance, extensibility, and ease of use, employing a three‑layer architecture: Reactor scheduling, Pipeline chain, and business logic processing.
Data encryption strategies are discussed, recommending symmetric encryption with keys protected in compiled libraries, server‑side key retrieval via HTTPS, or hybrid public‑key‑symmetric schemes; compression uses GZIP or custom LZ77.
Hermes, a Ctrip‑customized Kafka implementation, handles message persistence, high throughput, and partitioned ordering, while Avro is used for disaster‑recovery storage when the message queue fails, with automatic conversion back to Kafka once restored.
Feasibility tests compare Netty and Nginx under 5,000 concurrent connections, showing comparable request rates (~46k req/s) and acceptable latency; end‑to‑end processing meets the target of handling ~30k requests per second with 99% of requests completing under 800 ms.
The article also outlines related data analysis products built on the collected data, such as single‑user browsing tracking, page conversion rates, user flow analysis, click heatmaps, data validation tools, and system performance reports, highlighting their value for product optimization and user experience improvement.
Additional resources and references are provided for further reading.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
