Databases 15 min read

How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration

Ctrip migrated its Artnova reporting platform from Hive‑based queries to StarRocks, first loading data into OLAP tables and then using StarRocks as a lakehouse with Hive catalog, Data Cache and materialized views, achieving average query latency reductions from 20 seconds to 1.5 seconds, over 7× speed‑up versus Trino and up to 40× acceleration for complex workloads.

StarRocks
StarRocks
StarRocks
How Ctrip Accelerated Report Queries 10× with StarRocks: A Real‑World Lakehouse Migration

Background

Ctrip introduced StarRocks in 2022 and now runs more than ten clusters with over 230 TB of internal tables, handling more than 11 million queries per day across hotel, flight, travel, and other business lines. The internal reporting platform Artnova serves thousands of business users who configure custom SQL reports.

Challenges

SQL complexity and massive base tables – reports often involve multi‑table joins, sub‑queries and aggregations, with hundreds of lines of SQL and source tables exceeding hundreds of gigabytes or even terabytes.

High concurrency and low latency – at peak hours thousands of complex queries run simultaneously, and dashboards may contain dozens of reports that must return results instantly.

Stage 1 – StarRocks as OLAP

Data were imported from Hive into StarRocks: small tables via StreamLoad, large tables via SparkLoad, and appropriate indexes were created. This reduced average query time from roughly 20 seconds to 1.5 seconds, a >10× speed‑up that delivered near‑second report experiences. However, the approach introduced data‑freshness lag, added pipeline maintenance overhead, and could not be scaled to all workloads.

Performance comparison chart
Performance comparison chart

Stage 2 – StarRocks as Lakehouse

Starting with version 3.0, StarRocks added Hive catalog support and materialized‑view capabilities, allowing direct lake queries. With Data Cache enabled, StarRocks was on average 7.4× faster than Trino; in some scenarios the speed‑up reached dozens of times. The new architecture eliminated data‑delay, removed the need for bulk data imports, and enabled transparent acceleration of slow SQL via materialized views.

Lakehouse architecture diagram
Lakehouse architecture diagram

Key Optimizations

I/O merge : adaptive merging reduces the number of I/O operations.

Late materialization : filter predicate columns first, then read only the required columns, cutting total I/O.

Reader optimizations for various file formats.

Metadata cache : caches Hive table statistics and metadata.

Data Cache : on the first query, raw data blocks are cached on BE nodes; subsequent queries read locally, avoiding repeated HDFS fetches.

Materialized Views : automatically refreshed (including partition‑aware incremental refresh), support transparent query rewrite, and can shrink query time from minutes to seconds, delivering overall >10× acceleration.

Best‑Practice Migration Flow

Ensure SQL compatibility – set sql_dialect='trino' to let StarRocks parse Trino SQL; compatibility exceeds 99% in production tests.

Enable Data Cache on BE nodes, sizing memory and disk appropriately; co‑locate BE nodes with HDFS DataNodes to share storage.

Create suitable materialized views for heavy aggregation or multi‑table join queries; schedule refresh jobs linked to base‑table ingestion.

Use automated validation service: replay Trino SQL on a test StarRocks cluster daily, compare results, and gradually switch successful queries to StarRocks while notifying owners.

Automated migration architecture
Automated migration architecture

Future Work

Planned improvements include intelligent recommendation of materialized views, index support for MVs (contributed back to the community), and lake‑ETL integration using Iceberg + DBT + StarRocks to further boost data freshness and processing efficiency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Performance OptimizationBig DataStarRocksOLAPLakehousematerialized viewData Cache
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.