Applying ClickHouse for a High‑Performance Hotel Data Intelligence Platform
This article describes how Ctrip Hotel's data intelligence platform leverages ClickHouse to achieve real‑time analytics on billions of daily updates and millions of queries, detailing the system architecture, data ingestion pipelines, monitoring, and operational lessons learned for large‑scale, high‑availability data services.
1. Background
Ctrip Hotel processes thousands of tables and over ten billion data updates daily, requiring high availability for production applications, massive query volumes, and sub‑second response times across app and PC clients.
Traditional relational databases, sharding, Elasticsearch, and Redis could not meet the performance and cost requirements, leading the team to explore ClickHouse.
2. ClickHouse Overview
ClickHouse is a column‑oriented, real‑time analytics DBMS that uses vectorized execution and SIMD instructions to process massive data in parallel, offering high compression, fast writes (50‑200 MB/s), and efficient indexing without B‑tree constraints.
However, it lacks transaction support, true delete/update capabilities, and has limited concurrency (recommended QPS ~100), requiring careful data modeling and batch operations.
3. ClickHouse Practice in the Hotel Data Intelligence Platform
3.1 Data Update
The pipeline moves data from Hive to ClickHouse via two paths: Hive → MySQL → ClickHouse (using DataX) and Hive → ClickHouse (directly with DataX). Full loads import data into temporary tables, then rename them to swap with production tables, ensuring zero‑downtime.
Incremental loads originally used partition deletion, which caused data inconsistency; the improved method writes increments to a temporary table, then renames tables after a reverse‑write step, providing seamless updates.
3.2 Monitoring and Alerting of Data Ingestion
All synchronization statements are executed via ClickHouse's RESTful API, allowing QueryID tracking. The system polls query progress and triggers SMS alerts when error frequencies exceed thresholds.
3.3 Server Distribution and Operations
The deployment consists of four clusters (domestic, overseas, real‑time, risk control), each with 2‑3 servers in active‑standby mode and load‑balanced query routing. Failover is handled via configuration changes, and virtual clusters can be created to redistribute load during spikes.
Future plans include dispersing cluster nodes across different data centers for disaster recovery and implementing automatic health checks to isolate faulty servers.
4. ClickHouse Exploration
The team documented practical tips: disabling Linux swap to avoid memory pressure, configuring join_use_nulls for proper NULL handling, placing the smaller table on the right side of joins, batching writes with limited partition counts and pre‑sorting, minimizing join data size, using stable ClickHouse versions, avoiding distributed tables when possible, and monitoring CPU usage (keep below 70%) to prevent query timeouts.
5. Conclusion
Since the pilot in July last year, over 80% of business has migrated to ClickHouse, supporting more than ten billion daily updates and nearly one million queries while achieving sub‑second response for 98% of app requests and sub‑three‑second for PC.
ClickHouse delivers superior query performance and lower cost compared to relational databases, Elasticsearch, and Redis, handling over 4 billion rows on a single node. The team will continue to research newer versions and explore additional open‑source frameworks.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Ctrip Technology
Official Ctrip Technology account, sharing and discussing growth.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
