Databases 25 min read

Xiaomi’s OLAP Practice with Apache Doris: System Selection, Architecture, and User Behavior Analytics

This article details Xiaomi Group’s adoption of Apache Doris for OLAP, covering the evolution of their system selection, the architecture of their data ecosystem, practical implementations for user behavior analysis, and future plans to enhance performance, stability, and scalability.

DataFunTalk
DataFunTalk
DataFunTalk
Xiaomi’s OLAP Practice with Apache Doris: System Selection, Architecture, and User Behavior Analytics

Xiaomi Group’s OLAP system has evolved from early use of Kylin to a SparkSQL+Kudu+HDFS architecture, and finally to Apache Doris after evaluating ClickHouse. The initial Spark-based solution suffered from high operational cost and limited performance, prompting the search for a more efficient platform.

Apache Doris was first introduced in 2019 with a non‑vectorized version, later upgraded to a vectorized version (1.1.2) that delivered 1‑3× performance improvements. Xiaomi selected Doris for its integrated storage‑compute engine, strong SQL compatibility, simple operations, robust distributed support, and active open‑source community.

Today, Doris powers dozens of core business applications within Xiaomi, including user behavior analysis, AB testing, user profiling, smart manufacturing, and advertising. The largest Doris cluster comprises 99 BE nodes and 3 FE nodes, handling daily data ingestion of up to 12 billion rows and over 20 000 queries.

Within Xiaomi’s BI platform, Doris serves as the primary data source for large‑scale, low‑latency analytics, complementing MySQL, Hive, and Iceberg. The platform offers semantic modeling, automatic query acceleration via materialized views, and unified SQL entry through Apache Kyuubi, while still supporting direct JDBC connections to Doris.

The internal “Data Factory” (数据工场) abstracts storage and compute engines (Doris, Iceberg, Hive, MySQL, Kudu, Spark, Flink, Presto) and provides unified metadata management, permission proxying, job scheduling, and data governance. Metadata views follow the pattern doris. . . . , and permission is handled via user spaces with role‑based access.

For user behavior analysis, Xiaomi built a platform that models events with five dimensions (Who, When, Where, How, What) and supports event, retention, funnel, path, and distribution analyses. Custom aggregation functions were developed on Doris, such as retention_info() and retention_count() for retention analysis, funnel_info() and funnel_count() for funnel analysis, and session_del() and session_count() for path analysis.

These functions enable users to generate SQL queries that are executed on Doris, with results visualized in the platform. Examples include event‑based user counts, multi‑day retention rates, conversion rates across funnel steps, and session‑based path statistics.

Future plans focus on improving Doris’s resource isolation and stability, expanding the deployment of the vectorized version across all clusters, extending Doris to more core Xiaomi services (e.g., smartphones, automotive), and exploring unified SQL gateways to abstract underlying engines.

Big DataData WarehouseOLAPApache DorisUser Analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.