Big Data 19 min read

How Songguo Mobility Built a Real‑Time OLAP Platform with StarRocks: From 1.0 to 3.0

Songguo Mobility’s data‑center team migrated from a fragmented Impala‑Kudu‑ClickHouse stack to a unified StarRocks‑based real‑time OLAP architecture, iterating through three versions to solve scalability, latency, and maintenance challenges while supporting minute‑level dashboards for orders and vehicle analytics.

StarRocks

Jul 18, 2022

Background

The original real‑time data warehouse extracted change logs from MySQL using Canal, streamed them to Kafka, and persisted them in Kudu. Batch processing was performed by Spark and queries were served by Impala. This architecture suffered from high maintenance cost, complex component interactions, limited update capabilities, poor multi‑table join performance, and insufficient monitoring.

Real‑time Warehouse 1.0

Pipeline: Canal → Kafka → Kudu. Spark jobs read Kudu for periodic batch processing; Impala provided hour‑level dashboards. The design met basic offline analysis needs but required separate Spark jobs for Kudu reads and could not efficiently support fine‑grained metrics.

Real‑time Warehouse 2.0

To address 1.0 pain points, the stack was changed to:

Data capture unchanged: Canal → Kafka.

ETL: Flink Stream for deserialization and routing, Flink SQL for cleansing and layering.

DIM data stored in MySQL and HBase; ODS/DWD layers kept in Kafka.

Final sink: ClickHouse for external queries.

ClickHouse offered multiple table engines, partition pruning, column‑level expiration, and HTTP/JDBC/ODBC interfaces, but still exhibited weak update handling, poor multi‑table join performance, and high component maintenance cost.

Real‑time Warehouse 3.0 – StarRocks Integration

StarRocks became the core OLAP engine, unifying the real‑time path:

Change logs are extracted by Canal and deserialized by Flink Stream. Flink SQL enriches data and writes:

Logical views on wide tables provide flexible multi‑dimensional analytics.

StarRocks satisfied the following technical requirements:

Excellent performance for both wide‑table and multi‑table joins.

Full SQL and SQL‑like syntax support.

Efficient batch and real‑time ingestion (Broker Load & Routine Load).

Robust update, expiration, and schema‑change capabilities (UniqueKey replace_if_not_null for partial updates).

High concurrency and strong fault‑tolerance.

Compatibility with MySQL protocol and easy integration with external tools (HTTP, JDBC, ODBC).

Performance Benchmark

Tests were run on identical hardware comparing StarRocks 1.16 and ClickHouse 20.8:

Single‑table queries on <10⁹ rows (SELECT *, COUNT, SUM) showed comparable latency.

Multi‑table join queries: StarRocks outperformed ClickHouse by a large margin (near‑double speed in later releases).

Primary‑key update scenarios: ClickHouse’s ReplacingMergeTree could not guarantee accuracy, while StarRocks provided transactional updates with high performance.

Use Cases

Order Analytics

Historical order data are loaded from Hive into StarRocks via Broker Load. Incremental order changes are streamed through Canal → Flink → Routine Load. Logical views enable hour‑, minute‑, and second‑level dashboards for order volume, revenue, fees, and average order value across regions.

Vehicle Analytics

A wide table aggregates vehicle deployment, status changes, and usage logs. Flink creates the wide table; Hive provides historical snapshots; Canal streams incremental updates. Views support rapid calculation of available, in‑use, and maintenance‑status vehicles at city, region, and national levels.

Both scenarios benefit from a single ETL step (Flink) and a centralized query engine, reducing development effort, simplifying data validation, and accelerating feature rollout.

Operations & Monitoring

A community‑edition StarRocks cluster is deployed with VIP‑based front‑end load balancing. Custom monitoring tracks FE/BE health and Routine Load tasks; Grafana dashboards visualize key metrics.

Remaining Challenges

String column length limits restrict very large text fields.

Materialized views cannot handle complex aggregation conditions.

Log format makes error analysis cumbersome.

Dynamic partitioning supports only day/week/month granularity (no yearly partitions).

Future Plans

Migrate additional offline workloads from Hive/Presto to StarRocks.

Consolidate all ClickHouse tasks into StarRocks.

Expand multi‑dimensional analysis and optimize table designs (partition by time, bucket by key columns).

Improve materialized view usage and automate schema changes.

Enhance monitoring and integrate StarRocks more tightly with internal platform tools.

Explore real‑time tagging capabilities within StarRocks.

Table Design Highlights

All large tables are partitioned by a time column and bucketed by frequently queried and join keys.

Detail tables have column‑level expiration to control storage growth.

Updates use UniqueKey with replace_if_not_null; future PrimaryKey partial updates are under evaluation.

Routine Load interval is set to 10‑15 seconds to reduce backend merge frequency.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink StarRocks Kafka Performance Benchmark Real-time OLAP

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.