Databases 13 min read

How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

This article details Ctrip's hotel data platform challenges with billions of daily updates and near‑million queries, evaluates various storage options, explains why ClickHouse was chosen, and describes the full‑load and incremental pipelines, monitoring, server clustering, and practical tips that enable sub‑second query performance at massive scale.

dbaplus Community
dbaplus Community
dbaplus Community
How ClickHouse Powers Real‑Time Hotel Data Analytics at Ctrip

Background

Ctrip Hotel processes thousands of tables and over ten billion data updates daily, serving nearly one million query requests that drill from country‑level aggregates down to individual hotel rooms. Ensuring high availability during updates and delivering sub‑second response times on both app and PC required a robust, scalable solution.

Why ClickHouse?

Traditional relational databases could not meet the required latency at this scale, sharding introduced excessive cost, Elasticsearch lacked cross‑index joins, Redis could not provide real‑time aggregation, and other analytical engines (Presto, Greenplum, Kylin) were unsuitable. ClickHouse, a column‑oriented analytical DBMS with vectorized execution and SIMD acceleration, offered high CPU utilization, strong compression, non‑B‑tree indexes, fast full‑table scans, and write speeds of 50‑200 MB/s, making it the most appropriate choice despite its limitations.

ClickHouse Characteristics

Columnar storage with vectorized processing for efficient CPU use.

High compression reduces I/O; can process tens of billions of rows per server per second.

Indexes are not B‑tree based; any filter on indexed columns is sufficient.

Fast ingestion (50‑200 MB/s) suitable for massive data updates.

Key drawbacks include lack of transactions, limited concurrency (default QPS≈100), non‑standard SQL join syntax, and the need for batch inserts of 1,000+ rows to avoid performance penalties.

Data Update Architecture

The pipeline moves data from Hive to ClickHouse via two paths:

Hive → MySQL → ClickHouse (initially using DataX to load into MySQL, then ClickHouse native API).

Hive → ClickHouse (directly using DataX once it supported Hive‑to‑ClickHouse imports).

Both paths rely on a custom workflow that ensures automated, stable ingestion and high‑availability for online services.

Full‑Load Process

Data is first loaded into a temporary table; after completion, a RENAME operation swaps the temporary and production tables, instantly switching reads to the new data.

Incremental Load Process

Initially, incremental data was loaded by deleting specific partitions and inserting new rows, which caused data inconsistency and unpredictable delete latency. The improved method writes incremental data to a temporary table, copies current production data into it, then swaps tables via RENAME, ensuring seamless updates.

Monitoring and Alerting

Because large data transfers often timeout, all synchronization statements are executed via ClickHouse's RESTful API rather than JDBC. Each query receives a QueryID; the system polls this ID to track progress. If errors exceed a threshold, an SMS alert is sent to on‑call personnel.

Server Distribution and Operations

Four clusters (domestic, overseas/suppliers, real‑time, risk‑control) each contain 2‑3 nodes with active‑standby failover. Load balancers distribute queries across nodes. If a node fails, administrators remove it from the cluster configuration; virtual clusters can be created to offload heavy query periods.

Practical Tips and Pitfalls

Disable Linux swap; excessive swapping can cripple query speed.

Set join_use_nulls per account to get standard NULL semantics for missing joins.

Place the smaller table on the right side of a JOIN; ClickHouse always scans the right table first.

Batch inserts should contain at least 1,000 rows; small batches degrade performance.

Limit the number of partitions per batch and sort data by ORDER BY before insertion to avoid merge bottlenecks.

Prefer stable releases (e.g., last year’s version) as newer versions may introduce memory leaks or syntax incompatibilities.

Avoid distributed tables when possible; physical tables offer better performance and lower partition overhead.

Monitor CPU usage closely; sustained usage above 70 % often leads to query timeouts.

Results and Conclusions

Since the pilot in July last year, over 80 % of Ctrip Hotel’s business now runs on ClickHouse, handling more than ten billion daily updates and nearly one million queries while achieving 98.3 % of app queries returning within one second and 98.5 % on PC within three seconds. The solution delivers superior query performance and lower cost compared to relational databases, Elasticsearch, or Redis, and comfortably supports over 40 billion rows on a single node.

The team plans to continue deepening ClickHouse research, track upcoming releases, and explore additional open‑source frameworks to further optimize the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData WarehouseDatabase OptimizationCtripHotel Data
dbaplus Community
Written by

dbaplus Community

Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.