How StarRocks Boosted Query Speed 3‑10× for a Billion‑Scale Reporting Platform
Facing massive daily query loads, Wanwu Newborn’s Watcher reporting platform migrated from MySQL, Greenplum, and Trino to StarRocks, cutting compute nodes by half while achieving 3‑10× faster query performance, higher success rates, and lower cost, as demonstrated by TPC‑DS and real‑world business query benchmarks.
#01 Analysis and Architecture Choice
Wanwu Newborn’s data platform evolved from MySQL, Greenplum, and Trino to meet growing analytical demands. The Watcher reporting platform processes hundreds of thousands of queries daily across multiple business lines, and the team needed a solution that improved query speed without expanding hardware.
Two common OLAP approaches were considered: (1) Layered acceleration by loading analysis data into a separate database, which adds storage overhead and maintenance complexity; (2) Direct Hive querying with a high‑performance engine, which reduces storage cost but demands a fast query engine. After evaluating options, the team selected StarRocks for its ability to query Hive external tables with near‑internal‑table performance.
#02 Watcher Platform Business Challenges
Massive data volume: continuous growth of business data requires high‑throughput storage and processing.
Flexible filtering: users need multi‑dimensional slicing, rolling up, and drilling, which creates complex query patterns.
Complex SQL: typical reports involve 5‑6 table joins and generate hundreds of lines of SQL.
High concurrency and low latency: peak QPS exceeds 200, demanding real‑time response.
#03 Why StarRocks?
Trino had improved query latency but further gains required scaling out. StarRocks promised Hive‑external‑table performance close to native tables, lower hardware cost, and a simple migration path. Preliminary tests showed StarRocks delivering several‑fold speed improvements over Trino without data import.
#04 Benchmark (TPC‑DS)
Version: Trino 403, StarRocks 2.4.1
Method: identical hardware, three full runs, average latency taken; tested with 10, 20, and 30 parallel submissions; no local cache enabled.
Note: the chart shows overall query latency.
StarRocks achieved roughly 4× the overall performance of Trino. Even when the number of BE nodes was reduced to one‑third, StarRocks remained about 1.5× faster than Trino.
#05 Real‑World Query Validation
One hundred representative business SQLs were executed on identical servers. Results:
For very large queries, StarRocks had a higher success rate; Trino failed with out‑of‑memory errors on eight queries that StarRocks completed successfully.
Overall performance: in single‑threaded tests StarRocks was 6.77× faster than Trino; with 10 parallel threads the factor rose to 10.96×; with 20 parallel threads it reached 16.03×.
#06 Migration Process
Set the SQL dialect to Trino mode, allowing most existing Trino SQL to run unchanged: set sql_dialect='trino'; This simplified migration and preserved Trino as a fallback engine.
Integrated Apache Ranger for unified data‑access control, enhancing security and compliance while keeping fine‑grained permission management.
#07 Online Results and Future Outlook
By mid‑August, StarRocks 3.1 was deployed with 20 BE nodes, replacing roughly 40 Trino nodes and handling about 60 % of online query traffic. Despite halving compute nodes, 94 % of queries showed performance gains, with nearly 80 % improving 5‑10×.
Future plans include migrating all query traffic to StarRocks, adopting compute‑storage separation for elastic scaling, enabling Hive write support (Iceberg and upcoming Hive write capabilities), activating Data Cache with whitelist/blacklist controls, and integrating the upcoming Apache Ranger plugin for seamless permission management.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
