Databases 18 min read

Cutting Costs 40% and Halving Query Latency: Our ClickHouse‑to‑StarRocks Migration

Facing high costs and scaling limits with ClickHouse, we migrated a 4000‑core, 500TB OLAP workload to StarRocks, achieving 40% cost reduction, 50% storage savings, and up to 30× query speedups through storage‑compute separation, materialized‑view rewrites, and extensive performance tuning.

StarRocks
StarRocks
StarRocks
Cutting Costs 40% and Halving Query Latency: Our ClickHouse‑to‑StarRocks Migration

Background

DeWu’s smart‑operation platform relies on OLAP engines for customer service, risk control, supply chain, advertising, and A/B testing. Historically it operated five different engines—Alibaba Cloud Hologres, ADB, ClickHouse, and two self‑hosted ClickHouse/StarRocks clusters—leading to high operational overhead.

ClickHouse Challenges at Scale

Even though ClickHouse offers excellent single‑node performance, its distributed architecture struggled with horizontal scalability, metadata management, data consistency, and join performance. The migrated ClickHouse cluster served over 4000 CPU cores, stored more than 500 TB, and suffered from:

Opaque materialized‑view rewriting (no automatic view substitution).

Lack of bulk‑load capability, causing inefficient offline imports.

Vertical scaling ceiling and week‑long downtime for horizontal resharding.

To mitigate these issues, DeWu built a smaller standby cluster for hot data, but this added >50% extra cost.

Migration to StarRocks

In 2024, DeWu initiated a proof‑of‑concept to replace ClickHouse with a self‑hosted StarRocks cluster that supports storage‑compute separation. The migration involved:

Deploying a 4000‑core StarRocks cluster with compute nodes (CN) and remote object storage (OSS) for data.

Gradually routing DSL‑translated queries to StarRocks via the ONE DSL platform, enabling gray‑scale rollout and instant rollback.

Developing a ClickHouse‑to‑StarRocks DDL conversion tool to automate table and materialized‑view definitions.

Key architectural change: data resides in OSS (Distributed Storage) while compute nodes cache only hot data, allowing a single‑replica mode that reduces storage cost by two‑thirds.

Cost and Performance Benefits

After migration:

Overall infrastructure cost dropped 40% (AP engine) and storage cost fell to one‑sixth of the original.

Query latency halved: P95 of the main dashboard fell from 8.5 s to 4.3 s; high‑load queries dropped from 24.38 s to 11.94 s.

Cluster‑wide outage time shrank from 15 minutes to 30 seconds.

Storage‑compute separation also yielded a 2/3 reduction in offline import time and a 40%+ overall cost decrease.

Materialized‑View Enhancements

StarRocks introduced transparent materialized‑view rewriting, eliminating the need for external code to manage view creation and query rewriting. Additional optimizations include:

Limiting queries that span more than eight days on large tables.

Forcing rewrite mode (materialized_view_rewrite_mode='force') to hit views for multiple count‑distinct aggregations.

Automatic recommendation of materialized views based on FE‑captured SQL patterns, with statistics on field usage to suggest optimal sort keys.

Performance gains from view‑related fixes:

Improved partition‑field pruning (PR #46786).

Optimized implicit type conversions, delivering 20× speedup (PR #50168).

Enhanced date/datetime handling, achieving 30× speedup (PR #50643).

Reduced RPC calls to StarOS, cutting query time by >3 seconds (PR #46913).

Offline Import Improvements

Instead of ClickHouse’s single‑node insert, StarRocks loads data via OSS broker load, allowing controlled concurrency and reducing cluster resource consumption. Benchmarks show a two‑thirds reduction in import latency.

brand_id IN (SELECT brand_id FROM olap_znyy.st_itg_spu_info_all WHERE brand_level_name IN ('A级品牌','B级品牌','C级品牌'))
-- rewritten to
brand_id IN (SELECT DISTINCT brand_id FROM olap_znyy.st_itg_spu_info_all WHERE brand_level_name IN ('A级品牌','B级品牌','C级品牌'))

Compatibility Layer

StarRocks added a ClickHouse dialect (sql_dialect=clickhouse) that maps 72 common ClickHouse functions to StarRocks equivalents, preserving query semantics without user changes. It also implements ClickHouse‑style ArrayJoin via unnest and supports alias reuse in SELECT clauses.

Operational and Observability Enhancements

New APIs were introduced to expose resource watermarks, SQL complexity factors, materialized‑view metadata, and asynchronous query cancellation (custom_query_id). These aid monitoring and troubleshooting at scale.

Results and Conclusion

The migration met all cost and performance targets, providing a scalable foundation for future growth. DeWu contributed 28 PRs (24 merged) to the StarRocks open‑source project, fixing over 40 issues and adding 10+ features, reinforcing the community partnership.

Overall, the case demonstrates that storage‑compute separation, transparent materialized‑view rewriting, and careful query tuning can dramatically reduce both operational expense and latency for large‑scale OLAP workloads.

Cost and performance comparison chart
Cost and performance comparison chart
StarRocksPerformance TuningClickHouseOLAPMaterialized Views
StarRocks
Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.