Big Data 12 min read

How Autohome Achieved Sub‑Second Real‑Time Analytics with StarRocks

Autohome replaced Flink and Kylin with StarRocks to power sub‑second real‑time OLAP analytics, detailing data sources, pain points, benchmark comparisons against Apache Kylin, ClickHouse, Presto, Spark, and Doris, integration with Flink‑connector, broker‑load scripts, monitoring setup, and lessons learned from large‑scale deployments.

StarRocks

Nov 26, 2021

Real‑time Data Sources

Mobile client behavior logs

Server‑side application logs

Relational databases (MySQL, SQLServer)

Pain Points of Existing Stack

Flink‑based metric aggregation is inflexible and incurs high development cost for changing requirements.

Apache Kylin provides fast pre‑aggregated queries but cannot efficiently drill‑down to detailed data.

TiDB lacks pre‑aggregation models, leading to unstable query performance under heavy online aggregation.

Why StarRocks Was Chosen

Supports both detailed aggregation and pre‑aggregated models, offering flexible query scenarios.

MySQL‑compatible protocol allows use of existing MySQL clients and reduces operational overhead.

Fully vectorized engine delivers high query performance and concurrency.

Simplified architecture eases deployment and maintenance.

Benchmark Comparisons

Vs Apache Kylin : On a 600 M‑row dataset, StarRocks matches Kylin’s query speed for materialized‑view hits and is faster for some queries.

Vs ClickHouse : With 1.2 B rows on four servers, StarRocks and ClickHouse have similar performance for COUNT queries; StarRocks is 3–4× faster on approximate distinct (HLL) queries after version 1.18 optimisations.

Vs Apache Doris : On the same 600 M‑row benchmark, StarRocks is 2–7× faster thanks to its vectorized engine.

Vs Presto & Spark (Hive external tables) : On a 1 B‑row dataset across eight servers, StarRocks delivers the best performance for both COUNT and COUNT‑DISTINCT, with Presto ahead of Spark.

Storage Media : SSD clusters provide 3–8× speedup over HDD when PageCache is cold; performance converges when PageCache is warm.

Integration Practices

Real‑time platform : Integrated via flink-connector-starrocks, allowing native Flink DDL to map existing StarRocks tables as Flink tables.

Offline platform : Provided a broker‑load script that imports Hive data into StarRocks, with built‑in progress monitoring and automatic retry on failure.

Monitoring : Deployed Prometheus and Grafana to collect StarRocks metrics; audit logs are parsed to analyse query performance and success rates. Custom fixes were added to handle non‑standard FE metrics format.

Application Cases

Recommendation Service Real‑time Monitoring

Aggregates per‑minute method‑level metrics from multiple subsystems, writing ~200 M rows daily to StarRocks. Latency (TP95) is ≈ 1 s in the AutoBI dashboard.

Search Real‑time Effect

Computes exposure, click‑through rate, no‑result rate, and multiple UV metrics (using HLL) on billions of rows per day. Materialized views accelerate queries; upgrading from StarRocks 1.17 to 1.18 reduced response time from >10 s to <4 s.

Hardware Impact

On a 600 M‑row dataset with two machines, SSD clusters achieve 3–8× speedup over HDD when PageCache is not hit; performance becomes comparable when PageCache is warm.

Summary

StarRocks unifies detailed and pre‑aggregated query models, enabling a single OLAP engine for both real‑time and batch workloads.

Benchmarking shows consistent query‑performance advantages over Apache Kylin, ClickHouse, Apache Doris, Presto, and Spark.

MySQL compatibility simplifies client usage and operational management.

Integration with Flink (real‑time) and broker‑load (offline) provides end‑to‑end data pipelines.

Monitoring via Prometheus/Grafana and audit‑log analysis ensures visibility into query health.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Flink StarRocks OLAP

Written by

StarRocks

StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.