StarRocks Production Practice at Tongcheng Travel: Architecture, Use Cases, and Technical Evaluation
This article details Tongcheng Travel’s production deployment of the StarRocks OLAP database, covering background, business scenarios, technical evaluation against ClickHouse and Greenplum, implementation with Flink SQL, real‑time analytics, offline reporting, CDP use cases, performance optimizations, and future cloud‑native plans.
Tongcheng Travel, with years of OLAP experience, needed a more suitable component as business and data volume grew. After evaluating several options, StarRocks was selected as the primary OLAP database for real‑time, offline, and CDP applications.
Background & Current Situation
The company previously used Druid, Kylin, ClickHouse, and Greenplum. ClickHouse handled real‑time user behavior and order data, while Greenplum supported internal analytical dashboards. Both faced performance and operational challenges, prompting a search for a better solution.
Business Scenarios
Real‑time data analysis: hot data stored for 30 days to 1 year, requiring high‑performance queries; cold data kept in HDFS.
Flexible reporting (灵动数据报表): DWD/ADS tables are loaded into Greenplum, then visualized via a drag‑and‑drop dashboard.
User profiling & CDP: combine basic info, consumption, and behavior to generate tags, perform crowd selection and analysis. ClickHouse could not meet the 5 s query latency requirement for complex joins.
Technical Selection Criteria
StarRocks was compared with ClickHouse and Greenplum on data ingestion speed, query performance, memory usage, maintenance cost, and ease of use. The evaluation showed StarRocks excels in query performance (especially multi‑table joins), has minimal dependencies (only FE and BE processes), supports seamless scaling, and offers MySQL‑compatible queries.
Implementation Details
Real‑time data flows from Kafka/TurboMQ to Flink, where a Flink‑StarRocks connector writes into StarRocks. An example Flink SQL job is shown below:
CREATE TABLE realtimeorder_kafka (
createtime BIGINT,
platid varchar,
productid varchar,
orderflag varchar,
serialid varchar,
amount varchar
) WITH (
'connector' = 'kafka',
'topic' = 'xxxx',
'properties.bootstrap.servers' = 'ip:port',
'properties.group.id' = 'group_id',
'scan.startup.mode' = 'earliest-offset',
'format' = 'json'
);
CREATE TABLE realtimeorder_jdbc (
`partition` varchar,
`hour` int,
`min` int,
`productid` int,
`orderflag` varchar,
`serialid` varchar,
`amount` double,
PRIMARY KEY(`partition`, `serialid`, `productid`) NOT ENFORCED
) WITH (
'connector' = 'starrocks',
'jdbc-url' = 'jdbc:mysql://sr_ip:sr_port',
'load-url' = 'sr_ip:sr_port',
'database-name' = 'db',
'table-name' = 'table',
'username' = '****',
'password' = '****',
'sink.buffer-flush.max-rows' = '64000',
'sink.buffer-flush.interval-ms' = '10000',
'sink.properties.format' = 'json',
'sink.properties.strip_outer_array' = 'true'
);
INSERT INTO realtimeorder_jdbc
SELECT * FROM (
SELECT
'2022-11-30' AS `partition`,
HOUR(LOCALTIMESTAMP) AS `hour`,
MINUTE(LOCALTIMESTAMP) AS `min`,
CASE WHEN `productid` IS NULL THEN 0
WHEN IS_DECIMAL(`productid`) THEN CAST(`productid` AS INT)
ELSE 0 END AS productid,
CASE WHEN `orderflag` IS NULL THEN '' ELSE orderflag END AS orderflag,
CASE WHEN `amount` IS NULL THEN 0
WHEN IS_DECIMAL(`amount`) THEN CAST(`amount` AS DOUBLE)
ELSE 0 END AS amount,
`serialid`
FROM realtimeorder_kafka
) WHERE `partition` >= '2023-01-01';StarRocks also integrates with the offline analytics platform, allowing dashboards to query StarRocks tables directly via standard SQL, yielding up to a 2× performance boost over Greenplum.
For CDP, massive user‑profile data (over 150 million rows) is imported using a Bitmap‑based pipeline: string IDs are converted to long, aggregated into BitmapValue objects, encoded as Base64, and loaded into StarRocks, reducing import time from >10 minutes to <10 seconds.
Future Plans
Deploy StarRocks on a private K8s cloud to improve scalability, fault tolerance, and reliability.
Expand StarRocks to more business scenarios, replacing other OLAP components to lower maintenance overhead.
Test the upcoming 3.x version with compute‑storage separation for offline reports, aiming to keep query performance while reducing data sync.
Collaborate with the open‑source community, feeding back performance results and contributing demos.
Overall, StarRocks has become the unified OLAP engine for Tongcheng Travel, supporting real‑time analytics, flexible reporting, and large‑scale CDP workloads with high performance and low operational cost.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tongcheng Travel Technology Center
Pursue excellence, start again with Tongcheng! More technical insights to help you along your journey and make development enjoyable.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
