Databases 13 min read

How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates

This article explains how Ctrip’s hotel data intelligence platform handles over ten billion daily data updates and nearly a million queries by adopting ClickHouse, detailing the system's background, the reasons for choosing ClickHouse over other solutions, the data ingestion pipelines, monitoring strategies, operational practices, and performance outcomes.

ITPUB

Jul 2, 2019

How ClickHouse Powers Ctrip’s Hotel Data Platform for Billions of Daily Updates

Background

Ctrip’s hotel business generates thousands of tables and accumulates more than ten billion data updates each day, requiring high availability for production services and supporting nearly a million query requests that drill down from country‑level aggregates to hotel‑room granularity.

Traditional relational databases, sharding, Elasticsearch, Redis, Presto, Greenplum, and Kylin were evaluated but could not meet the latency, cost, or scalability requirements, leading the team to explore ClickHouse.

ClickHouse Overview

ClickHouse is a column‑oriented, real‑time analytical DBMS that leverages vectorized execution and SIMD CPU instructions to process massive data in parallel.

Key advantages include:

Column storage with vector processing for efficient CPU usage.

High compression reducing I/O and supporting tens of billions of rows per second per server.

Non‑B‑tree indexes that do not require the leftmost principle.

Fast ingestion rates of 50‑200 M rows/s, suitable for large‑scale updates.

However, ClickHouse has limitations: no transactions, limited delete/update support, recommended QPS around 100 (configurable), partial SQL compatibility, and the need for bulk writes of at least 1,000 rows to avoid performance penalties.

ClickHouse in the Hotel Data Platform

The platform adopts two main data ingestion paths:

Hive → MySQL → ClickHouse: DataX first loads data into MySQL, then ClickHouse’s native API imports it.

Hive → ClickHouse: DataX directly imports data when supported.

For full data loads, data is first imported into a temporary table, then a RENAME operation swaps the temporary and production tables, ensuring seamless data availability.

Incremental loads originally used partition deletion followed by re‑insertion, which caused data inconsistency and unpredictable deletion latency. The improved approach copies production data into a temporary table, loads incremental data there, and then swaps tables via RENAME, eliminating downtime.

Monitoring and Alerting

All synchronization statements are executed via ClickHouse’s RESTful API, allowing the assignment of a QueryID. The system polls the QueryID to track progress and logs exceptions. If error frequency exceeds thresholds, SMS alerts are sent to on‑call personnel.

Server Distribution and Operations

The deployment consists of four clusters (domestic, overseas/suppliers, real‑time, risk‑control), each with 2‑3 servers configured for master‑slave failover and load balancing. Faulty servers can be removed via configuration changes, and virtual clusters can be created to redistribute load during peak periods.

Future plans include dispersing cluster nodes across different data centers for disaster recovery and implementing automatic health checks to isolate problematic servers.

ClickHouse Usage Exploration

Practical lessons learned include:

Disabling Linux swap memory to avoid performance degradation when memory is exhausted.

Setting join_use_nulls per account to obtain standard NULL semantics for LEFT JOIN.

Placing the smaller table on the right side of JOIN operations for optimal performance.

When batch inserting via JDBC, limit the number of partitions per batch and sort data by ORDER BY to aid background merges.

Prefer aggregating data before JOINs; sometimes GROUP BY then JOIN is faster than JOIN then GROUP BY.

Use a stable, recent ClickHouse version; newer releases may introduce memory leaks or configuration bugs.

Avoid distributed tables when possible; excessive partition counts can fill disks during ingestion.

Monitor CPU usage closely—queries become unstable above 70% CPU.

Performance benchmarks: 60 M rows joined with 10 M rows and aggregated over a month returned in 190 ms; 240 M rows joined with 20 M rows grouped by month took ~390 ms.

Conclusion

Since the pilot in July last year, over 80% of Ctrip’s hotel services have migrated to ClickHouse, handling daily updates of tens of billions of rows and nearly a million queries while maintaining sub‑second response times (98.3% of app queries under 1 s, 98.5% of PC queries under 3 s).

ClickHouse delivers superior query performance and lower cost compared to relational databases, Elasticsearch, and Redis, comfortably supporting over 40 billion rows on a single server.

The team will continue to deepen its ClickHouse research, stay current with releases, and explore additional open‑source frameworks to further optimize the platform.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Monitoring Big Data data pipeline real-time analytics ClickHouse

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.