Databases 12 min read

Applying ClickHouse for a High‑Performance Hotel Data Intelligence Platform

This article describes how Ctrip Hotel's data intelligence platform leverages ClickHouse to achieve real‑time analytics on billions of daily updates and millions of queries, detailing the system architecture, data ingestion pipelines, monitoring, and operational lessons learned for large‑scale, high‑availability data services.

Ctrip Technology
Ctrip Technology
Ctrip Technology
Applying ClickHouse for a High‑Performance Hotel Data Intelligence Platform

1. Background

Ctrip Hotel processes thousands of tables and over ten billion data updates daily, requiring high availability for production applications, massive query volumes, and sub‑second response times across app and PC clients.

Traditional relational databases, sharding, Elasticsearch, and Redis could not meet the performance and cost requirements, leading the team to explore ClickHouse.

2. ClickHouse Overview

ClickHouse is a column‑oriented, real‑time analytics DBMS that uses vectorized execution and SIMD instructions to process massive data in parallel, offering high compression, fast writes (50‑200 MB/s), and efficient indexing without B‑tree constraints.

However, it lacks transaction support, true delete/update capabilities, and has limited concurrency (recommended QPS ~100), requiring careful data modeling and batch operations.

3. ClickHouse Practice in the Hotel Data Intelligence Platform

3.1 Data Update

The pipeline moves data from Hive to ClickHouse via two paths: Hive → MySQL → ClickHouse (using DataX) and Hive → ClickHouse (directly with DataX). Full loads import data into temporary tables, then rename them to swap with production tables, ensuring zero‑downtime.

Incremental loads originally used partition deletion, which caused data inconsistency; the improved method writes increments to a temporary table, then renames tables after a reverse‑write step, providing seamless updates.

3.2 Monitoring and Alerting of Data Ingestion

All synchronization statements are executed via ClickHouse's RESTful API, allowing QueryID tracking. The system polls query progress and triggers SMS alerts when error frequencies exceed thresholds.

3.3 Server Distribution and Operations

The deployment consists of four clusters (domestic, overseas, real‑time, risk control), each with 2‑3 servers in active‑standby mode and load‑balanced query routing. Failover is handled via configuration changes, and virtual clusters can be created to redistribute load during spikes.

Future plans include dispersing cluster nodes across different data centers for disaster recovery and implementing automatic health checks to isolate faulty servers.

4. ClickHouse Exploration

The team documented practical tips: disabling Linux swap to avoid memory pressure, configuring join_use_nulls for proper NULL handling, placing the smaller table on the right side of joins, batching writes with limited partition counts and pre‑sorting, minimizing join data size, using stable ClickHouse versions, avoiding distributed tables when possible, and monitoring CPU usage (keep below 70%) to prevent query timeouts.

5. Conclusion

Since the pilot in July last year, over 80% of business has migrated to ClickHouse, supporting more than ten billion daily updates and nearly one million queries while achieving sub‑second response for 98% of app requests and sub‑three‑second for PC.

ClickHouse delivers superior query performance and lower cost compared to relational databases, Elasticsearch, and Redis, handling over 4 billion rows on a single node. The team will continue to research newer versions and explore additional open‑source frameworks.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data pipelineReal-time analyticsData Warehousehotel platform
Ctrip Technology
Written by

Ctrip Technology

Official Ctrip Technology account, sharing and discussing growth.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.