Big Data 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article details Xiaomi's sales data warehouse development, covering its history, architecture, dimensional modeling, layer design, streaming‑batch integration, governance, security, and future directions, while also addressing practical Q&A on implementation challenges and best practices.

DataFunSummit
DataFunSummit
DataFunSummit
Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article introduces Xiaomi's sales data warehouse, outlining its development timeline, positioning, content, role, and scale, and sets the stage for six main discussion points.

1. Sales Data Warehouse Introduction: It describes the evolution from siloed warehouses before 2019 to a unified sales warehouse guided by the ABC strategy, with offline construction completed in 2020, real‑time capabilities added in 2021, and a current architecture handling billions of daily logs across orders, logistics, stores, user behavior, and products.

2. Data Warehouse Construction Theory: The piece explains the process of business analysis, dimensional modeling, and layer design (ODS, DWD, DWM, DIM, DM, ADS, TMP). It emphasizes modeling principles such as high cohesion/low coupling, sinking public logic, balancing cost and performance, ensuring consistency, and supporting data rollback.

3. Sales Data Warehouse Architecture: Xiaomi adopts a Lambda architecture, using Spark + Hive for batch processing and Flink + Talos for streaming, with Hologres accelerating DW and DM layers. It discusses state expiration issues in real‑time streams, the hybrid offline‑online solution, and the transition to an Iceberg‑based unified batch‑stream approach.

4. Capability Layer: The article highlights unified data architecture, real‑time minute‑level processing with Iceberg, security governance (compliance, classification, least‑privilege access), and metric management via a data encyclopedia that links to corporate dashboards.

5. Summary and Outlook: It reflects on the successful construction of an offline sales warehouse, widespread internal adoption, and outlines future trends toward data value creation and real‑time metricization.

6. Q&A Session: Answers cover refund handling via offline correction, data permission layers, replacement of Kudu with Hologres, distinctions between DWD/DWM layers, access permissions for various layers, and storage of dimension metrics.

Big DataFlinkStreamingData WarehouseSparkIcebergLambda architecture
DataFunSummit
Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.