Databases 14 min read

How Tongcheng Travel Scaled Real‑Time Analytics with StarRocks

Tongcheng Travel migrated its multi‑stage OLAP platform from Druid/Kylin and ClickHouse‑Greenplum to a unified StarRocks solution, dramatically improving real‑time query latency, offline report performance, and CDP data processing while reducing operational complexity and enabling cloud‑native deployment.

StarRocks

Jun 2, 2023

Evolution of Tongcheng Travel OLAP Platform

Three stages: (1) Druid + Kylin offline acceleration – limited SQL capabilities, high CUBE maintenance cost, and poor federation; (2) ClickHouse + Greenplum combination – fragmented real‑time and offline architecture, still high operational overhead; (3) StarRocks‑based unified platform – single component, lower maintenance, tight integration with internal data services.

In the original architecture, real‑time data flowed from Kafka through Flink for cleansing and widening before landing in ClickHouse, while offline data was stored in HDFS, processed by Hive + Spark, and finally loaded into Greenplum.

Why StarRocks Was Chosen

The team evaluated StarRocks, ClickHouse, Greenplum, and Presto on data ingestion speed, query performance, memory usage, maintenance cost, and ease of use. StarRocks excelled in multi‑table joins, primary‑key updates, and offers a simple FE + BE architecture without external dependencies.

StarRocks in Practice at Tongcheng Travel

1. Real‑time analytics acceleration

Business events are ingested via Kafka or TurboMQ, processed by Flink (or Flink‑SQL), and written directly to StarRocks using the official flink‑starrocks‑connector, which provides exactly‑once semantics. This eliminates the widening step and reduces end‑to‑end latency.

2. Offline report query performance

StarRocks is integrated as a data source for the “灵动分析” system. Because it supports the MySQL protocol and full SQL, reports can be built directly on StarRocks tables without a separate ETL step. Compared with Presto and Greenplum, query latency improved up to nearly 2× in several production workloads.

3. CDP system query efficiency

The CDP pipeline required massive bitmap‑based user segmentation. The team converted string user IDs to long oneId, generated BitmapValue objects via Spark, merged bitmaps per key, encoded the result as Base64, and bulk‑loaded into StarRocks. This reduced the import time for 150 million rows from over 10 minutes to under 10 seconds and cut network traffic by an order of magnitude.

Complex queries that join wide tables with multiple bitmap tables now finish within 3–10 seconds, even at billion‑row scale.

Future Plans

Expand StarRocks to replace remaining ClickHouse and Greenplum components across business lines.

Deploy StarRocks on a private Kubernetes cluster to leverage auto‑scaling, fault‑tolerance, and cloud‑native features.

Evaluate the upcoming 3.x storage‑compute separation capability to further reduce offline sync workload without sacrificing query speed.

Maintain active contributions to the open‑source community and share performance findings.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

StarRocks OLAP

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.