Big Data 15 min read

How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation

Ctrip migrated its massive User Behavior Tracking system from ClickHouse to a compute‑storage separated StarRocks cluster on Kubernetes, achieving millisecond‑level query latency, halving storage usage, reducing node count, and sustaining millions‑of‑rows‑per‑second write throughput while simplifying scaling and operations.

Ctrip Technology

Nov 27, 2025

How Ctrip Cut Query Latency by 85% with StarRocks’ Compute‑Storage Separation

Background

UBT (User Behavior Tracking) is Ctrip's core data collection and analysis platform. It ingests tens of terabytes of user‑behavior events per day and retains data for up to 30 days (some tables for a year). The system serves Android, iOS and NodeJS SDKs and powers troubleshooting, monitoring and aggregation analytics.

Problems with ClickHouse

Write path suffered data loss and backlog because historical back‑fill triggered large partition compressions, exhausting CPU and I/O.

Horizontal scaling required costly data migration; ClickHouse’s monolithic compute‑storage design limited elasticity and demanded three‑replica storage, inflating hardware costs.

Large‑time‑range queries were slow, with average response times in seconds, failing real‑time analysis requirements.

Gohangout, the custom ClickHouse ingestion tool, was single‑process, fragile and hard to configure.

StarRocks Migration

StarRocks originally used an integrated compute‑storage model that delivered sub‑second query latency but suffered from rigid resource utilization and high replication cost. The team adopted StarRocks’ newer compute‑storage separation architecture, where stateless compute nodes (CN) query data stored in remote object storage (OSS/S3) and optionally enable a DataCache for hot data. This design eliminates data movement during scaling, allowing a node to be added or removed in seconds; a typical UBT scaling operation now completes in about 5 seconds .

StarRocks on Kubernetes

The production cluster is managed with the open‑source starrocks‑kubernetes‑operator (https://github.com/StarRocks/starrocks-kubernetes-operator). The team extended the operator by:

Implementing an ElasticVolumeSet controller to replace StatefulSet and provide dynamic disk expansion.

Adding a SET‑based CRD that lets a single StarRocks custom resource define multiple instance specifications, enabling gray‑release, rolling upgrades and A/B testing without downtime.

A unified portal wraps common operational tasks into point‑and‑click actions, improving operational efficiency for non‑engineers.

Stability and Tuning

5.1 Storage Design

UBT stores petabytes of data; the largest tables reach hundreds of terabytes and ingest millions of rows per second. StarRocks uses system‑time hourly partitions and a bucket count of 128, reducing write fragmentation. Compression switched from LZ4 to ZLIB, saving ~30 % storage.

5.2 Compaction

Control compaction parameters (file count, thread count).

Leverage hourly partitions and bucket sizing to lower compaction frequency.

Enable MergeCommit to merge small writes into larger versions, cutting file count and I/O.

Monitoring uses native_compactions and partitions_metas tables, plus SHOW PROC COMPACTIONS, with Grafana dashboards to keep the Compaction Score below 100.

5.3 Data Backfill

Approximately 100 TB of historical data were migrated from ClickHouse/Hive to StarRocks using SparkLoad . Spark cleanses the data, writes it to HDFS, and StarRocks pulls it from HDFS, avoiding MemoryStore writes and minimizing compaction overhead.

5.4 Real‑time Increment

Real‑time ingestion uses Flink → StarRocks with MergeCommit . Compared to per‑request commits, MergeCommit reduces version count, I/O operations and latency, delivering up to 10× higher IOPS and 10× larger I/O size per write.

5.5 Query Optimization

Detail queries apply mandatory partition pruning and prefix indexes. Aggregation queries use hourly partition‑level materialized views (Partition MV) that refresh only changed partitions. An optimization changed the MV refresh algorithm from O(M×N) to O(M+N), dramatically improving FE response when partitions exceed ten thousand.