How to Build a Quality‑Driven Data Warehouse: Architecture, Tools, and Best Practices
This article explains the end‑to‑end process of constructing a quality‑focused data warehouse, covering project planning, requirement definition, technical architecture design, and the selection of key products such as DataHub, Mammut, YouData, Cangjie, and Unified Query.
0. Preface
The first part introduced basic concepts of data warehouses; this continuation focuses on the construction process of a quality‑centric data warehouse and the products used.
1. Project Planning & Requirement Definition
The DW/BI development lifecycle starts with project planning, where the main tasks are defining project goals and scope.
As business expands, traditional quality assurance methods struggle to meet coverage needs. To measure and improve the effectiveness of quality assurance across development, testing, and project management, a metric‑ and model‑driven quality visualization system is built, encompassing data domains such as release versions, requirement tasks, test cases, bugs, and git commits.
2. Technical Architecture Design & Product Selection
The initial focus is on offline data metric production, so the early stage emphasizes offline warehouse structure design and tool selection. The quality warehouse is derived from the existing offline warehouse architecture, forming a separate project that co‑exists with the business data warehouse.
DataHub : Collects data from sources like MySQL, MongoDB, Kafka, HBase, etc., synchronizes it to target databases, and provides a unified data format for downstream platforms such as the Mammut big‑data compute platform and streaming platforms. It also handles bidirectional or unidirectional sync among MySQL, Hive, ES, HBase, Redis, MongoDB, Excel, and external HTTP interfaces.
Mammut : A data development platform for developers and platform administrators, integrating data transmission, ETL, scheduling, and supporting engines like Hive, Spark, and MapReduce. It is a primary tool for quality warehouse developers to create scheduled data tasks and generate required tables.
YouData : A customized visual BI product based on NetEase YouData, used to create interactive dashboards and various chart types. Quality warehouse developers use it to visualize metrics produced by Mammut, while report viewers can filter and explore data of interest.
Cangjie : The metric management system of the warehouse, managing atomic metrics, derived metrics, dimensions, derived words, and modifiers. It clarifies metric definitions and calculation logic, helping developers avoid duplicate or erroneous implementations and enabling report consumers to understand metric calculations for better decision‑making.
Unified Query : Synchronizes tables generated by the data warehouse to multiple database types, allowing downstream applications to query needed metrics without worrying about the underlying database, thus improving query speed and reducing repetitive configuration work.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Yanxuan Tech Team
NetEase Yanxuan Tech Team shares e-commerce tech insights and quality finds for mindful living. This is the public portal for NetEase Yanxuan's technology and product teams, featuring weekly tech articles, team activities, and job postings.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
