Designing and Implementing a Comprehensive Data Metric System and Data Warehouse for Online Education
This article explains how to build a systematic data metric framework—including indicator definitions, dimension management, data dictionary, and data warehouse architecture—to turn massive online‑school data into actionable business insights and support efficient decision‑making and BI visualization.
Background
When making product decisions and operating strategies, we need data to support them. Raw data alone provides limited information; we require the insights and correlations that lie behind the data. As the online school business grows, data is generated continuously and at massive scale, so linking data with business and converting it into valuable information is essential.
Business data analysis typically considers dimensions and metrics, which form the foundation of analysis. Structured data can be understood as a multi‑dimensional cube containing dimensions and measures.
We can drill, slice, and dice this cube in various ways, compare multiple measures, and derive business meanings that guide decisions and operational strategies.
In real‑world online‑school operations, a few dimensions are insufficient to fully understand business status; deeper and broader observations are needed, especially to quickly identify causes when metrics deviate. This requires a well‑designed indicator system.
1. What Is an Indicator System?
An indicator system is the core of data‑warehouse modeling and the basis of multi‑dimensional analysis. A reasonable, complete system enables the warehouse to provide data, deliver value, and dramatically improve data‑driven decision efficiency. It organizes various data indicators into a structured hierarchy aligned with business processes and dimensions.
2. Functions of an Indicator System
The system helps monitor business development and operations, guiding decisions and strategies in a fast‑changing market, and provides tools for problem identification, thereby improving overall efficiency.
Systematic indicator dimensions allow multi‑angle business status assessment.
Rapidly locate root causes when product strategies or operations encounter issues.
Efficiently provide data support for operational decisions.
Integrate with physical data for seamless access; this covers the first two analytical layers—"what happened" and "why it happened"—with the next two layers discussed in a later article.
Indicator System Construction
1. Basic Concepts
Data Domain : A relatively independent business scope within a module, abstracted for analysis and maintained long‑term.
Business Process : Concrete business events such as order, payment, refund.
Time Period : Defines the statistical time range (e.g., last 30 days, natural week).
Modifier : Business‑scene qualifiers beyond dimensions (e.g., PC side, APP side).
Atomic Indicator : An indivisible metric tied to a specific business event (e.g., attendance count).
Derived Indicator : Atomic indicator + optional modifiers + time period (e.g., attendance count for Beijing campus in the last day).
Dimension : The environment of a measure, reflecting a business attribute.
Dimension Item : Specific values belonging to a dimension (e.g., Beijing, Shanghai under city dimension).
2. Indicator Specification
Clear classification and naming conventions ensure indicators are self‑explanatory and reduce communication cost. Indicators are categorized as atomic, derived, or composite.
Atomic Indicator = behavior + measure.
Derived Indicator = time period + modifier + atomic indicator.
Composite Indicator : Result of mathematical operations on atomic or derived indicators (e.g., renewal rate = renewal count / enrollment count).
Atomic indicator Chinese name = behavior name + measure name
Atomic indicator English code = behavior code + measure code
Derived indicator English code = atomic indicator + time period + other modifiers
Derived indicator Chinese name = time period + [other modifiers] + atomic indicator
3. Indicator Ownership
Transactional Indicators : Measure business activities (e.g., daily class submission count).
Stock Indicators : Count of entities (e.g., total students, total teachers) with time period usually set to "as of now".
Data Dictionary
Early management using wiki and Excel suffered from lack of standardization, version conflicts, difficult retrieval, inconsistent calculation logic, no lineage analysis, and poor physical data mapping.
1. Data Domain & Business Process Management
Data domains define the scope of business data; each includes Chinese name, English code, and abbreviation for unique identification and downstream warehouse modeling.
Business processes are indivisible events belonging to a data domain, also stored with Chinese name, English code, and abbreviation.
2. Common Term Management
Four term types: Time Period, Modifier, Behavior, Measure. Each term includes type, Chinese name, English code, and description, enabling automatic indicator name generation.
3. Indicator Management Module
Indicators are divided into atomic, derived, and composite. Atomic indicators are built from behavior + measure, with automatic code generation to avoid manual errors.
Derived Indicators add time period and optional modifiers to atomic indicators; codes are auto‑generated as atomic+time+modifiers.
Composite Indicators are calculated via Hive‑style expressions (case when, round, etc.) using atomic or derived indicators and dimensions; they are evaluated by the Rockbase engine without developer intervention.
The module stores both basic business information (name, classification, data domain, process, frequency, unit, type, definition, logic) and technical details (data type, code, source tables, lineage, supported dimensions, related indicators), automatically synchronized from the model management system.
4. Dimension Management Module
Dimensions are either raw or derived. Raw dimensions describe business attributes and can be maintained manually or automatically via linked dimension tables.
Derived dimensions are generated from raw dimensions using aggregation (e.g., city level) or combination (e.g., subject‑version‑difficulty).
5. Work‑Order Management Module
The work‑order system handles online requests, reviews, and approvals for any entity changes in the data dictionary, ensuring standardized and audited updates.
Data Warehouse Construction
After establishing a logical indicator system, we must efficiently compute the metrics and adapt to rapid business changes, which is achieved by building a data warehouse.
Data Warehouse provides a strategic collection of data for enterprise‑wide decision making, supporting analysis, monitoring, and control.
1. Consistent Dimensions
Consistent dimensions serve as the backbone of a bus architecture, enabling shared dimensions across the platform or department.
2. Data Domain Partitioning
Data domains group tightly related topics, facilitating management and application. They are defined after business analysis to cover current and future needs.
3. Building the Bus Matrix
After identifying business processes per data domain, we map them to relevant dimensions to create a bus matrix, guiding analytical dimensions for each process.
4. Layered Architecture
Data layering ensures stability, isolates downstream impact, and prevents overly long pipelines. The proposed structure organizes data into logical layers for controlled, reusable development.
Dependencies must be forward‑only; same‑level dependencies are allowed.
Prefer using common layers to avoid duplication.
5. Export to Data Dictionary
After ETL, each metric’s calculation logic and naming are recorded in the data dictionary to ensure clear communication among analysts, operations, and developers.
6. BI Presentation
With the metric system and data ready, a rich‑chart BI tool is needed for visualization. Existing tools like Tableau lack direct integration with the data dictionary, so we built our own OLAP engine, Rockbase, which will be detailed in a subsequent article.
Achievements
Unified indicator management ensures consistent definitions, calculation logic, and data sources.
Unified dimension management guarantees consistent definitions and values.
Provides unified monitoring and alerting for dimensions and indicators.
Supports product‑level lineage tracking.
Lays the foundation for future governance systems (DQC and Rockbase built around the data dictionary).
Past Highlights
Application of Predictive Models in Teaching Scenarios
Statistical Principles of A/B Testing in Online Education
Application of Sentence‑Pair Text Similarity Algorithm in Online School
Xueersi Online School Tech Team
The Xueersi Online School Tech Team, dedicated to innovating and promoting internet education technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.