Adopting StarRocks for Ctrip's Large-Scale Hotel Data Platform: Architecture, Performance, and Operations
This article describes how Ctrip's hotel data platform migrated from ClickHouse to StarRocks, detailing the platform's current state, pain points, the evaluation and selection of StarRocks, its architecture, performance benchmarks, data ingestion models, high‑availability design, and future migration plans.
The HData platform (referred to as HData) provides data visualization for Ctrip's large‑scale hotel business, serving around 2,200 daily UV and 100k PV, with traffic spiking 2‑3× during holidays. It stores roughly 700 billion rows (≈8 TB raw, 1.75 TB compressed) and processes over 2,000 daily pipelines, updating about 150 billion rows each day.
Since 2018 the platform has relied heavily on ClickHouse, achieving sub‑second response for 95 % of queries, but it struggles with high‑concurrency workloads, especially during holiday peaks where CPU usage can exceed 70 % and complex queries can saturate a single node.
To mitigate load, the team introduced proactive and passive caching mechanisms that pre‑populate cache for frequently accessed pages and for users who accessed related data in the past five days, reducing ClickHouse query volume and avoiding unlimited scaling.
However, the caching approach introduced new challenges: real‑time data cannot be cached for long (1‑2 minutes is the limit), cache hit rates for real‑time data are only ~10 %, and maintaining dual writes to ClickHouse and MySQL increases hardware cost and development effort.
Seeking a more suitable ROLAP engine, the team evaluated several options (Ignite, CrateDB, Kylin) and selected StarRocks for its sub‑millisecond query latency, strong performance on high‑concurrency and multi‑table joins, elastic scaling, hot‑standby architecture, materialized view support, online schema changes, and MySQL‑compatible protocol.
StarRocks' architecture consists of a Front‑End (FE) layer handling MySQL connections, metadata, and query planning, and a Back‑End (BE) layer storing columnar data in tablets, executing distributed plans, and performing index and predicate push‑down filtering.
Performance tests on a 6‑node cluster (3 FE + 3 BE) showed StarRocks outperforming ClickHouse: Test 1 – 547 ms vs 1,814 ms; Test 2 – 126 ms vs 142 ms; Test 3 – 387 ms vs 884 ms, confirming faster query response.
StarRocks supports three data models: Detail (full history with primary‑key rows), Aggregate (deduplicated rows with aggregated metrics), and Update (latest row replaces previous ones). It offers five ingestion methods—BrokerLoad, SparkLoad, StreamLoad, RoutineLoad, and InsertInto—covering batch, streaming, and ad‑hoc loads.
Real‑time data is ingested via RoutineLoad using an update model, while T+1 offline data is loaded via StreamLoad using a detail model. To handle out‑of‑order Kafka messages, a compensating process sorts logs by timestamp and reconciles the latest state with the main table.
For disaster recovery and high availability, the team deploys FE and BE nodes across two data centers (5:5 traffic split), uses load‑balancer configuration for dynamic routing, supervises all processes, runs health‑check jobs to evict failed FE nodes, and relies on StarRocks' internal replica balancing for BE failures, complemented by email/SMS alerts and hardware metric monitoring.
After migration, 70 % of real‑time scenarios run on StarRocks with average query latency around 200 ms; queries exceeding 500 ms constitute only 1 % of total traffic, and the system now maintains a single code and data stack, reducing both personnel and hardware costs.
Future plans include migrating all remaining real‑time and offline workloads to StarRocks, enhancing monitoring, and implementing data hot‑cold separation via Hive external tables to further lower hardware expenses.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
