How NAVER Boosted Query Performance and Scalability by Migrating from ClickHouse to StarRocks
NAVER migrated its massive analytics platform from ClickHouse to StarRocks, achieving dramatic improvements in multi‑table JOIN performance, real‑time aggregation speed, and horizontal scalability while simplifying data integration across heterogeneous sources on a Kubernetes‑based architecture.
Background
NAVER operates a data platform that stores more than 20 PB of data in an Apache Iceberg lakehouse and serves over 200 services. The analytics system must provide real‑time insights for service performance monitoring and user‑behavior analysis.
Challenges with ClickHouse
Fixed dimensions : ClickHouse does not support native multi‑table JOIN, forcing denormalized tables and limiting flexible analytics.
Scalability : Data must be manually rebalanced across nodes, which becomes time‑consuming and error‑prone as volume grows.
Mutable data handling : The “merge‑on‑read” model degrades performance for real‑time updates and deletions.
Evaluation of Alternatives
The team benchmarked Trino, Pinot, Druid and StarRocks against four criteria:
Multi‑table JOIN : Ability to execute complex cross‑table queries without denormalization.
Real‑time aggregation : Fast execution of dynamic analytical queries.
Scalability : Seamless horizontal scaling with low operational overhead.
Data update : Support for real‑time data modifications without query slowdown.
Why StarRocks Was Chosen
Out‑of‑the‑box multi‑table support : Native JOIN eliminates the need for denormalized tables.
Federated analysis : Direct integration with Apache Iceberg and other open formats provides a unified query layer.
Superior aggregation performance : Comparable or better than ClickHouse under dynamic workloads.
Cloud‑native scalability : Decoupled storage‑compute architecture simplifies node expansion and resource management.
Performance Testing
Test Setup
Real queries and datasets were executed on two Kubernetes clusters – a small‑scale cluster and a large‑scale cluster – covering multi‑column GROUP BY, multi‑table JOIN and horizontal scaling scenarios.
Results
StarRocks consistently outperformed ClickHouse in multi‑column aggregation on both cluster sizes.
All JOIN queries completed successfully on StarRocks, while ClickHouse failed four out of five.
Horizontal scaling showed linear performance growth for StarRocks as resources increased.
Production Deployment
The final architecture runs on Kubernetes with the following pod composition:
5 Front‑End (FE) pods for query parsing, planning and coordination.
80 Back‑End (BE) pods, each provisioned with 48 CPU, 50 GiB RAM and 10 TB SSD for storage.
70 stateless Compute (CN) pods, each with 48 CPU and 50 GiB RAM for high‑throughput query execution.
Monitoring
StarRocks provides a pre‑configured Grafana dashboard template that monitors cluster health, merge status and FE/BE metrics, enabling proactive maintenance.
Materialized Views
StarRocks’ materialized views automatically rewrite queries and refresh on a schedule. In JOIN‑heavy workloads they delivered up to 6× speedup without requiring manual SQL changes.
Outcomes
Interactive SQL queries replace fixed ClickHouse metrics, giving analysts full flexibility.
Significant performance gains for multi‑table JOIN and multi‑column GROUP BY, even with real‑time updates.
Unified query platform integrates an internal catalog, Apache Iceberg external catalog and Hive legacy catalog.
Linear horizontal scalability on Kubernetes reduces cost while maintaining consistent performance.
Future Plans
Further performance tuning, e.g., optimizing partition strategies for timestamp‑based queries.
Contribute internal improvements back to the open‑source StarRocks community.
Expand integrations with additional external data sources via Apache Iceberg.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
