Big Data 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article introduces Xiaomi's sales data warehouse practices, covering its development history, positioning, architecture, dimensional modeling, layer theory, capability building, real‑time and batch processing using Lambda architecture, Iceberg, Flink, and Hologres, and discusses future trends and Q&A.

DataFunTalk
DataFunTalk
DataFunTalk
Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

Introduction – The article presents the practice of Xiaomi's data‑middle‑platform department in building a sales data warehouse, outlining its evolution, positioning, content, role, and scale.

1. Sales Data Warehouse Overview – Describes the warehouse’s development from siloed warehouses before 2019 to a unified platform guided by the ABC (AI, Big data, Cloud) strategy, detailing its data sources (orders, products, stores, after‑sale, logistics, logs) and the dimensions modeled for orders, logistics, and user behavior.

2. Warehouse Construction Theory – Explains business analysis, theme domain definition, fact and dimension table design, dimensional modeling, layer separation (ODS, DWD, DWM, DIM, DM, ADS, TMP), and key modeling principles such as high cohesion, low coupling, public logic sinking, cost‑performance balance, consistency, and data rollback.

3. Architecture – Shows that Xiaomi adopts a Lambda architecture: batch processing with Spark + Hive, stream processing with Flink + Talos, DW/DW layers accelerated by Hologres, and integration of offline and real‑time data. Discusses challenges like state expiration in Flink and solutions using offline streams.

4. Capability Layer – Highlights unified data architecture, real‑time minute‑level processing on Iceberg, Flink + Talos for second‑level streaming, strict development and quality standards, data security compliance (GDPR, privacy), and the use of a data encyclopedia for metric definitions and governance.

5. Summary and Outlook – Summarizes the achievements of the offline sales warehouse, its extensive usage across the company, and future directions focusing on data value‑creation and real‑time metrics.

Q&A – Provides answers to six questions covering refund handling, permission layers, replacement of Kudu with Hologres, DWD/DWM distinctions, access to lower layers, and storage of dimension metrics.

Big DataFlinkData WarehouseHologresIcebergLambda architecturesales analytics
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.