AutoHome Data Warehouse Architecture and Layered Model Design
This article describes AutoHome's data warehouse architecture, detailing its background, business pain points, layered model design (RDM, ADM, GDM, FDM, Stage/BDM, DIM, TMP), advantages in performance, cost, efficiency, quality, and various application scenarios including BI, analytics, and decision support.
AutoHome, as the world's most visited automotive website, has accumulated vast industry experience and massive business data over fifteen years. With the advancement of the internet era, user demands have become more diverse and business scenarios more complex, prompting a strategic shift toward data‑centric operations to better leverage data wealth and serve business needs.
The company faces several pain points: massive log data cannot be reasonably analyzed; business staff find it difficult to obtain data for operational support; the same business metrics yield inconsistent figures across different personnel; cross‑departmental and cross‑business data integration is challenging; senior leadership requires more precise, multi‑granularity data for decision making; and there is an increasing need for real‑time data presentation.
To address these issues, AutoHome built an enterprise‑level data warehouse using open‑source big data technologies. The warehouse unifies offline and real‑time calculations, integrates data from source systems, log collection, offline documents, and unstructured data, and provides stable, continuous data ingestion while preserving historical changes.
The data warehouse adopts a layered model consisting of: RDM (data presentation layer), ADM (aggregated data layer), GDM (general data layer), FDM (basic data layer), Stage/BDM (data access layer), DIM (dimension layer), and TMP (temporary layer). Each layer has a defined function—such as storing source‑consistent data, performing light and heavy aggregation, providing standardized metrics, and supporting temporary processing—delivering benefits in performance (pre‑aggregation reduces I/O), cost (reuse reduces storage and compute), efficiency (user‑friendly data publishing), and quality (uniform statistical口径 minimizes inconsistencies).
Advantages stem from a hybrid modeling approach that combines Bill Inmon’s paradigm modeling with Kimball’s bottom‑up dimensional modeling, driven by demand and supplemented by data. This ensures data accuracy, consistency, and high‑efficiency query processing through slowly changing dimensions, redundancy, high‑concurrency scheduling, partitioning, and bucketing. Additional strengths include a self‑built data service toolchain (collection platform, data express, unified scheduling, visualization, metadata management) that streamlines the full data lifecycle, and a real‑time productization rate exceeding 90% via Flink SQL‑based development.
Application scenarios cover simple data queries, BI products (AutoHome Data Compass, data dashboards, traffic data platform, Beidou system), data modeling, analysis, prediction, and data mining. The warehouse also supports real‑time traffic wide‑table models, recommendation‑search models, and UAS models, enabling authoritative data for leadership decisions.
In conclusion, enterprise data construction is a long‑term endeavor; while many companies have built data warehouses, most remain at the BI reporting stage. To evolve toward analysis‑driven, predictive, and intelligent data operations, firms must solidify data foundations and increase investment. AutoHome, backed by a large user base and deep big‑data talent, will continue advancing its data‑construction roadmap under its data‑technology strategy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
