Big Data 14 min read

How Kingsoft Office Boosted Query Speed 2.3× with StarRocks 3.0

Kingsoft Office migrated its reporting platform from a multi‑engine stack to StarRocks 3.0, achieving a 48.84% performance gain, halving query latency, reducing operational costs, and improving resource utilization while supporting storage‑compute separation and seamless Trino SQL compatibility.

StarRocks

Aug 1, 2024

Background and Challenges

In the digital era, Kingsoft Office serves nearly 600 million monthly active devices, creating growing data‑processing demands. Their reporting platform originally combined multiple open‑source engines (Tez, Spark, Trino, ClickHouse) but suffered from SQL compatibility issues, high O&M costs, and difficulty coordinating resources across independent clusters.

Evaluation and Adoption of StarRocks 3.0

After a technical survey, the team selected StarRocks 3.0 for its storage‑compute separation, S3‑compatible object‑storage support, and strong query performance. Benchmark tests in an environment with identical resources showed StarRocks outperforming Trino, delivering 2.3× faster query speed on both local storage and Hive external‑table scenarios.

Test results: overall task performance improved by 48.84%, average query time was reduced to half of the previous solution, and idle‑time resource utilization increased significantly.

Architecture Changes

StarRocks 3.0 introduced a compute‑storage separated design, allowing data to reside in object storage (S3, OSS, MinIO) while hot data is cached locally. This aligns with Kingsoft’s cloud strategy and simplifies scaling, as storage nodes no longer need to be expanded together with compute nodes.

The migration also leveraged the sql_dialect='trino' parameter, enabling existing Trino‑specific SQL syntax to run unchanged on StarRocks, thus easing the transition of legacy jobs.

Operational Benefits

Reduced O&M cost: no complex node expansion required as in ClickHouse’s integrated model.

Improved resource elasticity: CN nodes focus on compute while storage is handled separately.

Higher query efficiency: StarRocks average query time is roughly half of Trino’s.

Lower task queuing and faster report execution.

Resource savings: StarRocks consumes about 35.8% of cluster resources while handling 70% of compute tasks, compared with Trino’s 64.2% resource share for only 30% of tasks.

Practical Deployment

Two clusters were deployed: a Kubernetes‑based StarRocks cluster for data‑lake external queries and a physical‑machine cluster for OLAP workloads. Over 12,000 SQL jobs (≈80% of reporting tasks) were migrated, achieving minute‑level label generation versus hour‑level in the previous Hive‑based system.

Future Plans

Upcoming StarRocks features such as Multi‑Warehouse will enable further isolation of ETL, OLAP, and streaming workloads, improving resource utilization and stability. The team plans to consolidate physical clusters into Kubernetes for easier maintenance and to continue leveraging StarRocks’ storage‑compute separation to lower costs and boost performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

big data StarRocks Storage-Compute Separation

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.