Third‑Generation Metric Platform: Enabling a Light Data Warehouse with NoETL
This article explains how a third‑generation metric platform replaces traditional ETL‑heavy data‑warehouse pipelines with a semantic‑driven NoETL approach, reducing cost, improving quality and efficiency, and delivering automated, self‑service analytics for both IT and business users.
The presentation introduces the concept of a third‑generation metric platform and a "light" data warehouse, highlighting the limitations of conventional ETL‑centric data‑warehouse architectures that rely on extensive wide and summary tables.
It identifies the "original sin" of ETL as high cost, low efficiency, and low quality caused by anti‑normalized processing, duplicate table development, and cumbersome business‑IT communication cycles.
Two remediation strategies are proposed: (1) avoid building wide and summary tables altogether, and (2) replace manual ETL with NoETL by standardizing metric semantics, allowing the system to automate data preparation.
The application‑layer NoETL works by standardizing metric definitions and establishing logical links between fact and dimension tables, enabling the platform to automatically generate the necessary wide or summary tables without manual coding.
The metric‑semantic layer supports automatic generation of anti‑normalized tables, standardized metric definitions, complex multi‑level aggregations, and conversion of metric definitions into computation nodes, allowing business analysts to define metrics without writing SQL.
The automation architecture consists of a calculation‑engine layer (MPP engines such as StarRocks and Doris) and a materialization layer that builds and selects materialized views, with automatic query rewriting to ensure performance at scale.
Benefits include consistent metric definitions, reduced development effort and cost, faster iteration, improved data quality, and self‑service analytics; a case study of a securities firm shows unified metric management cutting analysis cycles from weeks to days.
Overall, the third‑generation platform delivers semantic‑driven, automated metric production that bridges the gap between IT and business, enabling efficient, high‑quality, and cost‑effective data analysis.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.