Design and Implementation of Ctrip's Real-Time User Data Collection System

This article details the design, technology selection, architecture, encryption, compression, and performance evaluation of Ctrip's real-time user data collection system, which leverages Java, Netty, Kafka, and Avro to achieve high throughput, low latency, and robust fault tolerance for mobile and web applications.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Design and Implementation of Ctrip's Real-Time User Data Collection System

The author, Wang Xiaobo, a senior engineer at Ctrip's framework R&D department, introduces the real-time user data collection system he designed, which addresses the limitations of traditional PC‑based logging in the era of mobile internet.

The system is built on a Java stack, using Netty (a high‑performance NIO framework) for network communication and Hermes (a Kafka‑based distributed message queue) for storage, providing real‑time, high‑throughput, and universal data collection.

Key components include a client SDK that sends data via HTTP/TCP/UDP to a Mechanic (UBT‑Collector) server, which processes and forwards the data to Hermes/Kafka; monitoring data is stored in HBase and visualized via a Dashboard.

Netty was chosen after evaluating alternatives (Mini, xSocket) for its rich features, performance, extensibility, and ease of use, employing a three‑layer architecture: Reactor scheduling, Pipeline chain, and business logic processing.

Data encryption strategies are discussed, recommending symmetric encryption with keys protected in compiled libraries, server‑side key retrieval via HTTPS, or hybrid public‑key‑symmetric schemes; compression uses GZIP or custom LZ77.

Hermes, a Ctrip‑customized Kafka implementation, handles message persistence, high throughput, and partitioned ordering, while Avro is used for disaster‑recovery storage when the message queue fails, with automatic conversion back to Kafka once restored.

Feasibility tests compare Netty and Nginx under 5,000 concurrent connections, showing comparable request rates (~46k req/s) and acceptable latency; end‑to‑end processing meets the target of handling ~30k requests per second with 99% of requests completing under 800 ms.

The article also outlines related data analysis products built on the collected data, such as single‑user browsing tracking, page conversion rates, user flow analysis, click heatmaps, data validation tools, and system performance reports, highlighting their value for product optimization and user experience improvement.

Additional resources and references are provided for further reading.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data collectionBackend DevelopmentPerformance TestingNetty
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.