Accelerating Zepp Health’s Analytics with StarRocks: An OLAP Case Study
Facing inflexible point‑lookup limits and slow query times on HBase, Zepp Health redesigned its massive event‑tracking data pipeline—migrating ingestion through Kafka, Flink, and Hudi to a StarRocks‑based OLAP layer—achieving sub‑100 ms average query latency, 20 % storage savings, and dramatically faster multi‑dimensional analytics.
Background
Zepp Health (formerly Huami) collects massive event‑tracking (埋点) data from wearable devices. The platform requires fast, flexible multi‑dimensional metric calculations (e.g., PV, UV) across dimensions such as time, event name, city, and device attributes.
Original Architecture and Pain Points
Data stored in HBase as key‑value pairs, supporting only point lookups; no native aggregation or statistical functions.
Bitmap techniques could not be used, so metrics had to be pre‑computed in Spark/Hive, preventing ad‑hoc set operations.
Long processing chain with Spark + Hive + HBase increased maintenance cost and lacked model abstraction, making business upgrades cumbersome.
Evaluation and Selection of StarRocks
Several OLAP engines (Impala, Druid, ClickHouse, StarRocks) were benchmarked. StarRocks was chosen for its high read/write throughput, MySQL‑compatible protocol, active community, and built‑in vectorized execution, which together satisfied the real‑time analytics requirements.
New Architecture Design
The revised pipeline consists of:
Raw event data is ingested through a gateway into Kafka .
Flink reads from Kafka and writes to Apache Hudi using two table types: Upsert table – handles streams where deduplication is required. Append table – handles high‑throughput streams that are not deduplication‑sensitive.
Periodically, the Hudi tables are materialized into a DWD (Data Warehouse Detail) layer.
DWD data is transformed into a DWS (Data Warehouse Summary) layer and loaded into StarRocks via Broker Load , which writes rows that contain Bitmap columns for set‑based metrics.
Analysts query the unified analytics platform directly against StarRocks.
Data Flow Details
Convert raw JSON/Proto payloads to a unified schema.
Write the converted records to Hudi:
Use Upsert for streams that may contain duplicate keys (e.g., user‑level events).
Use Append for high‑volume, append‑only streams (e.g., raw sensor logs).
Export the Hudi tables to a StarRocks aggregation model. The model defines Bitmap columns (e.g., bitmap_pv, bitmap_uv) that store distinct identifiers.
Apply custom set operations on the Bitmap fields inside StarRocks to compute metrics such as total page views (PV) and unique visitors (UV) without pre‑aggregation.
End users run self‑service SQL queries on StarRocks to retrieve real‑time analytics results.
Performance Metrics
StarRocks monitoring shows an average query latency of ~100 ms and a P99 latency of ~250 ms for the typical analytics workload, meeting the platform’s real‑time requirements.
Benefits of the Migration
Efficiency: Query response time dropped from minutes to seconds; many large queries now finish within seconds.
Flexibility: Arbitrary multi‑dimensional and time‑range metric combinations are supported without pre‑computing redundant statistics.
Storage Savings: StarRocks’ columnar compression reduces storage cost by ~20 % for the same workload.
Simplicity: Operational overhead is lower compared with ClickHouse, thanks to native MySQL compatibility and integrated monitoring.
Convenience: Point‑lookup metrics now return in milliseconds, improving the self‑service experience.
Community Contributions
Fixed a bug in StarRocks materialized view creation.
Added support for additional object‑storage types during data import.
Extended configuration options for specialized import scenarios.
Future Work
Current limitation: Bitmap fields cannot be directly imported from Parquet files in heterogeneous storage environments. Zepp Health is collaborating with the StarRocks community to resolve this issue (see https://github.com/StarRocks/starrocks/issues/3279).
Further business units will be integrated into the OLAP platform, extending StarRocks’ role in supporting smart‑wearable health services.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
StarRocks
StarRocks is an open‑source project under the Linux Foundation, focused on building a high‑performance, scalable analytical database that enables enterprises to create an efficient, unified lake‑house paradigm. It is widely used across many industries worldwide, helping numerous companies enhance their data analytics capabilities.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
