Databases 18 min read

StarRocks in Youzu's Multi-Dimensional Analytics: Architecture, Advantages, and Future Plans

This article presents Youzu Network’s adoption of StarRocks for multi-dimensional analytics, detailing the historical OLAP challenges, StarRocks’ features and advantages, its application scenarios, data modeling choices, ingestion methods, performance benchmarks, and future roadmap for unified analytics.

DataFunSummit
DataFunSummit
DataFunSummit
StarRocks in Youzu's Multi-Dimensional Analytics: Architecture, Advantages, and Future Plans

Background : Youzu’s previous OLAP system relied on multiple components such as Presto, ClickHouse, SparkStreaming/Flink, HBase, and MySQL, leading to high maintenance cost, inconsistent SQL syntax, and performance issues with large result sets.

Requirements : The team needed a unified OLAP engine with sub‑second write latency, millisecond query response, good multi‑table join performance, simple operations, high concurrency, and strong usability.

Evaluation and Choice : After comparing ClickHouse, Doris, and StarRocks, StarRocks was selected for its superior performance, MPP execution, columnar storage, vectorized engine, and CBO optimizer.

StarRocks Advantages : It offers extreme query speed, diverse import methods, simple operation, rich data models (detail, aggregate, update, primary‑key), support for external tables, and easy deployment with only FE and BE nodes.

Application Scenarios : Real‑time parent‑monitoring for under‑age gamers, where Kafka streams are processed by Flink and written to StarRocks, with offline data used to overwrite delayed records; primary‑key model chosen for frequent updates.

Architecture : Flink reads Kafka, performs lightweight ETL, writes to both Hive and StarRocks, and StarRocks handles minute‑level scheduled metric calculations, serving reports directly without intermediate MySQL storage.

Data Modeling : Transition from wide tables to star/snowflake schemas enabled by StarRocks’ efficient multi‑table joins; partitioning by time and hash‑based bucketing are used to balance storage and query performance.

Reliability : StarRocks guarantees exactly‑once semantics via label‑based stream load; offline data is loaded through Hive external tables with cache refresh strategies.

Future Plans : Migrate remaining real‑time workloads to StarRocks, enhance Data API services, and improve monitoring for slow queries and system performance.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkStarRocksKafkaOLAP
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.