Big Data 15 min read

Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

This article describes Ctrip's development of a unified real-time data platform, detailing its motivations, architectural choices such as Kafka and Storm, implementation of shared schemas, resource control, monitoring, and operational lessons, as well as experiences with Storm, JStorm, and Streaming CQL.

Qunar Tech Salon
Qunar Tech Salon
Qunar Tech Salon
Building a Unified Real-Time Data Platform at Ctrip: Architecture, Practices, and Lessons Learned

Ctrip operates many business units with diverse and rapidly changing data needs, making traditional batch processing insufficient; a unified real-time data platform was required to provide timely analytics across all units.

Before the platform, each department built its own real-time solutions using various message queues (ActiveMQ, RabbitMQ, Kafka) and processing frameworks (Storm, Spark‑Streaming, custom code), leading to instability, lack of monitoring, and poor data sharing.

The platform was designed to satisfy four key requirements: high stability, a complete set of supporting facilities (testing, deployment, monitoring, alerting), easy data and application‑scenario sharing, and prompt service response for developers.

After evaluating options, Kafka was chosen as the standard messaging system, and Storm was selected as the real‑time processing engine for its maturity and stability, though Spark‑Streaming was also considered viable.

The architecture streams logs and business data from servers into Kafka; Storm topologies consume the data, perform calculations, and write results to external storage used by various business lines.

Data sharing is achieved by defining schemas with Avro, publishing them on a centralized portal, automatically generating Java classes and JARs, and allowing users to add the generated artifacts as Maven dependencies.

Storm APIs were wrapped to simplify deserialization for users, and resource control was added by moving topology and executor concurrency settings to the portal, improving platform stability.

Initial integrations included the high‑volume UBT user‑behavior stream and Pprobe traffic logs, focusing on real‑time analytics and reporting, and highlighted the importance of early, large‑scale data ingestion for platform stabilization.

Operational enhancements added include exporting Storm logs to Elasticsearch for searchable dashboards, extending metrics (e.g., Kafka latency, consumer lag) with custom MetricsConsumer feeding Graphite and a dashboard, and building an alert system with severity‑based notifications (TTS calls, emails).

Additional connectors (Spouts and Bolts) were provided for Ctrip's Message Queue, Redis, HBase, and relational databases to simplify user development.

The platform supports four major real‑time application categories: data reporting, business monitoring, user‑behavior‑driven marketing, and risk/security use cases, with examples such as website performance monitoring, AB testing alerts, personalized recommendations, and fraud detection.

Common Storm issues encountered include bugs STORM‑763 and STORM‑643, and best practices such as using localOrShuffleGrouping and ensuring Bolt member variables are serializable were shared.

Recent work explored Streaming CQL (a SQL‑like engine that compiles to Storm topologies) and migration to JStorm (Java‑based Storm implementation) to improve performance, resource isolation, and developer ergonomics, adding support for various data sources and sinks.

Future directions involve further migration to JStorm, evaluation of Twitter's Heron, and investigation of dataflow models like Google Dataflow, Apache Beam, and Structured Streaming to combine low latency with strong correctness guarantees.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataReal-time StreamingKafkaData PlatformStormCtrip
Qunar Tech Salon
Written by

Qunar Tech Salon

Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.