Quark’s Data Platform Upgrade with StarRocks: Architecture, Performance, Roadmap
This article details how Quark’s data platform consolidated multiple analytics engines into a unified StarRocks‑based OLAP solution, covering business background, engine selection, architecture redesign, performance tuning, operational practices, and future plans for scalability and reliability.
Business Background
Quark’s data platform needed to satisfy diverse business lines for data viewing, extraction, and usage, resulting in several data products such as QBI dashboards, quality inspection, ad‑hoc analysis, and real‑time marketing. These products depended on multiple compute engines and storage systems to meet different scenarios.
Trino/Presto: Hive‑based analytics for dashboards and ad‑hoc queries.
Impala/Kudu: Real‑time ingestion from business systems, federated analysis of Hive offline and Kudu real‑time data.
Druid: Real‑time Kafka ingestion for dashboards.
ClickHouse: Offline Hive import for user analysis and offline audience scenarios.
The multi‑engine architecture introduced compatibility problems, performance bottlenecks, and high operational costs, prompting Quark to upgrade the compute architecture with a more efficient, unified data processing solution.
Selection and Evaluation
During engine selection, StarRocks, ClickHouse, Trino, and Kylin were evaluated. Considering scenario coverage, query performance, and operational difficulty, StarRocks performed best and became the preferred solution.
Scenario Coverage
StarRocks supports a wide range of use cases, from OLAP multi‑dimensional analysis to real‑time ad‑hoc queries.
Write‑ability Comparison
(1) StarRocks supports Flink/Kafka real‑time ingestion and batch broker load, and features a primary‑key model enabling UPSERT similar to Kudu.
(2) ClickHouse excels at batch loading (e.g., INSERT INTO SELECT) but relies on Kafka engine tables and materialized views for real‑time writes, making the pipeline complex.
(3) Trino is essentially a query layer; write operations depend on underlying storage (HDFS/S3) and cannot independently optimize data layout.
(4) Kylin requires cube construction jobs with hour‑ or day‑level latency and cannot handle real‑time updates.
Operational cost considerations highlighted that StarRocks, compatible with Trino dialect at 90% (99% after development), greatly reduces migration effort and cost.
StarRocks Overview
StarRocks is a next‑generation, high‑speed, full‑scenario MPP distributed database that supports OLAP multi‑dimensional analysis, custom reports, real‑time analytics, and ad‑hoc queries.
Its architecture consists of a front‑end (FE) and back‑end nodes: BE for local storage and CN for object storage or HDFS. StarRocks does not depend on external components, simplifying deployment and maintenance. Nodes can be scaled horizontally without service interruption, and metadata replication improves reliability and prevents single‑point‑of‑failure.
StarRocks MPP Architecture
StarRocks uses an MPP distributed execution framework: a query is split into multiple physical units that run in parallel across nodes, each with dedicated CPU and memory resources, enabling full resource utilization.
The query flow includes lexical analysis, syntax parsing, logical plan generation, logical plan rewriting, cost‑based optimizer (CBO), and physical plan generation. Physical plans consist of operators such as Scan, Local Aggregate, and DataSink, which are instantiated and scheduled on BE nodes.
Other StarRocks Features
StarRocks includes a new CBO optimizer that selects optimal execution plans for complex multi‑table joins, intelligent materialized views that support base‑table updates and synchronized view refresh, and robust fault‑tolerance mechanisms.
Application Practice
StarRocks now serves as the unified OLAP engine, replacing Trino, Presto, Druid, Impala, Kudu, Iceberg, and ClickHouse. This consolidation simplifies operations, reduces costs, and improves overall query performance.
Cluster Status
The StarRocks clusters cover all business lines, with dozens of clusters and hundreds of nodes. Daily PV exceeds one million, with P95 query latency in the sub‑second range for internal tables.
New Architecture
StarRocks acts as the single OLAP engine, handling both batch and real‑time workloads. The architecture improves reliability, reduces operational overhead, and boosts query speed.
Foundational Work
To ensure stability, Quark built observable, high‑availability clusters, integrated StarRocks metrics with the internal monitoring system, and implemented automatic node self‑healing. Partition pruning and materialized cache techniques were applied to optimize query performance.
High Availability – Disaster Recovery
Two independent clusters (StarRocks1 and StarRocks2) were deployed. A unified query service routes dashboard traffic to StarRocks1 and mail service traffic to StarRocks2, achieving physical isolation. During off‑peak hours, idle CN nodes from StarRocks1 are shut down and resources are re‑allocated to StarRocks2 to handle night‑time workloads.
Query Performance Optimization
StarRocks performance was tuned by analyzing SQL parsing, logical plan rewriting, and CBO optimization. Specific improvements included constant folding, partition pruning, and adding missing functions such as jodatime_format to reduce execution time.
Application Cases
QBI Dashboard
QBI dashboards provide self‑service reporting for the whole company, handling millions of PV and thousands of UV. Migration to StarRocks reduced P95 query latency from 5.7 s to 2.4 s, nearly halving response time.
Fun Analysis Migration
Fun Analysis, a flexible multi‑dimensional tool, migrated ~200 projects by employing dual‑write to StarRocks and Kudu, automated routing, and extensive performance testing to ensure consistency and low latency.
StarRocks Enhancements and Community Contributions
Quark contributed to StarRocks by raising Trino compatibility from 90% to 99%, enhancing the optimizer for syntax pruning, adding custom parameters and metrics, and submitting multiple pull requests to the open‑source community.
Future Planning
Quark plans to deploy StarRocks on a self‑managed Kubernetes cluster to leverage robust infrastructure management, and to further exploit materialized view layering and sync updates for real‑time data warehouse scenarios, enhancing query performance and decision‑making support.
Qunar Tech Salon
Qunar Tech Salon is a learning and exchange platform for Qunar engineers and industry peers. We share cutting-edge technology trends and topics, providing a free platform for mid-to-senior technical professionals to exchange and learn.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
