Design and Evolution of a R&D Measurement Platform: Architecture, Data Governance, and Interactive Analytics
This article details the purpose, technical evolution, architecture, data‑source unification, dimensional modeling, data‑warehouse layering, SQL‑as‑metric approach, and interactive design of a measurement platform built to improve R&D efficiency through systematic data collection and visualization.
1 What the Measurement Platform Does
The platform systematically collects key R&D data and presents it intuitively, helping users understand delivery value, efficiency, and quality, while providing a reliable observation system to identify problems and support business improvements.
2 Technical Construction History
2.1 Platform V0.1 (2019‑2022)
Started in 2019 to collect requirements, bugs, project delays, and release data for efficiency analysis. Simple maintenance without continuous updates.
2.2 Platform V0.5 (2022‑2023)
Growth revealed performance bottlenecks: long query times, low metric accuracy, and slow production metrics. A complete reconstruction was decided in 2022.
2.2.1 Reconstruction Idea
Data Model and Business Logic Decoupling
The original system tightly coupled data models with business logic, making changes costly. The redesign separates them, records detailed business data, and introduces four table types: detail, dimension, bridge, and summary tables.
Detail tables store raw, granular data (e.g., individual requirements, work hours, bugs).
Dimension tables hold attribute information (project, personnel, module).
Bridge tables manage many‑to‑many relationships.
Summary tables aggregate data for fast queries.
Data flows through glue code that connects tables. The V0.5 architecture added data‑dependency management and pre‑computed metrics but kept the production metric approach unchanged.
2.2.2 Basic Data Governance
Data cleaning and processing were required because most source data were new. Project‑process data from TAPD were standardized through two measures: promoting a unified project‑process standard and embedding the process into the development workflow.
After governance, the platform achieved a "one standard, one platform" goal, improving data uniformity and collection efficiency.
2.2.3 Achieved Effects
Data pre‑computation solved slow queries.
Metric accuracy and traceability improved, but overall effect fell short of expectations; calculation logic remained scattered in code.
2.3 Platform V1 (2023‑Present)
V0.5 computed data one day after collection (T+1). V1 focuses on data construction to achieve one‑hour data availability, unified data sources, SQL‑as‑metric, and efficient interactive design.
Unified data source to ensure consistency.
SQL‑as‑metric for traceable, maintainable indicators.
Interactive UI with cards, chart linking, auxiliary fields, correlation analysis, and drill‑down.
3 Platform V1 Details
3.1 Technical Architecture
Offline processing using Hive, StarRocks, Spark, and internal platforms (星河, 星火).
星河 is a self‑developed PaaS providing data ingestion, development, quality, assets, services, and metric management. 星火 is an intelligent BI platform for low‑code data analysis.
3.2 Unified Data Source (Data‑Warehouse Construction)
3.2.1 Preparation – Bottom‑up Inventory
Identify all existing data sources and fields, map them to metrics, and build a metric‑to‑source matrix.
3.2.2 Dimensional Modeling
Adopt a four‑step dimensional modeling process to ensure DW compliance, defining facts, dimensions, granularity, and grain.
3.2.3 Data‑Warehouse Layers
Four layers: ODS (raw), DW (wide tables), DWS (light aggregation), ADS (OLAP layer). This structure supports efficient querying and analysis.
3.3 SQL‑as‑Metric
Metrics are defined directly by SQL statements, ensuring transparency, traceability, and rapid one‑hour data delivery.
3.4 Efficient Interactive Design
Charts are linked and drillable; a theme‑domain approach groups related data, enabling correlation analysis and one‑click navigation to related charts.
4 Summary
The platform improves data accuracy, fast data availability, and rapid analysis by leveraging a data‑warehouse and BI system, freeing resources to focus on metric management and interactive visualization.
5 Future Plans
Improve backend configuration UI to lower the learning curve.
Provide training and guides to reduce user onboarding cost.
Open‑source alternatives such as Superset, Apache DevLake, Druid, ClickHouse, and Pinot are suggested for teams without commercial DW/BI solutions.
Zhuanzhuan Tech
A platform for Zhuanzhuan R&D and industry peers to learn and exchange technology, regularly sharing frontline experience and cutting‑edge topics. We welcome practical discussions and sharing; contact waterystone with any questions.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.