Ctrip's Real-Time Data Platform: Architecture, Practices, and Lessons Learned
This article details Ctrip's journey building a unified real-time data platform—covering business motivations, architectural requirements, technology choices like Kafka and Storm, implementation of Avro schemas, monitoring, alerting, operational lessons, and future explorations such as Streaming CQL and JStorm.
Author Zhang Yi, head of Ctrip's big data platform, shares over a decade of experience in building Ctrip's real-time data processing system.
Because Ctrip's numerous business units require low‑latency data analysis, the traditional batch model could not meet their needs, prompting the creation of a unified real‑time platform.
The platform must satisfy four core requirements: high stability, complete supporting facilities (testing, deployment, monitoring, alerting), seamless data and application sharing, and timely service response.
After evaluating options, Ctrip selected Apache Storm (later also considering Spark‑Streaming) as the real‑time engine and standardized on Kafka for messaging. Data schemas are defined with Avro, published to a central portal, and automatically generated Java classes are distributed via Maven.
To improve stability, Ctrip wrapped Storm APIs, providing custom spouts/bolts for Redis, HBase, and DB writes, and moved topology and executor configuration to the portal. Metrics are enriched (e.g., Kafka latency) and sent to a custom dashboard and Graphite for alerting, with high‑priority alerts triggering phone calls.
Operational experience highlighted the importance of early monitoring/alerting, clear documentation, controlled onboarding pace, and teaching users basic Kafka/Storm concepts.
New explorations include Streaming CQL (a SQL‑to‑Storm engine) that simplifies real‑time jobs to a single SQL statement, and JStorm, an Java‑based Storm clone offering better performance and resource isolation.
Future directions focus on migrating more workloads to JStorm, evaluating Twitter's Heron, and researching the Dataflow model (Google Dataflow, Apache Beam, Structured Streaming) for next‑generation real‑time processing.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.