Designing a Scalable Financial Data Warehouse: Modeling, Layers, and Quality Control
This article outlines a comprehensive approach to building a financial data warehouse, covering background needs, modeling methodologies, a layered architecture (I, C, S, R), data quality monitoring, metadata management, and detailed naming and coding standards to ensure maintainable, high‑quality data pipelines.
Background
Since 2018, expanding business lines have driven increasing demand for data analysis and applications, making a robust financial data warehouse essential. Key challenges include disorganized data storage, lack of unified data quality definitions, and inconsistent metadata management, which increase development and maintenance costs.
Big Data Modeling Overview
Three common modeling approaches are compared:
ER Model – aligns with enterprise‑level subjects; easy to implement but limited flexibility and suitable mainly for small, simple models.
Dimensional Model – focuses on analytical needs; intuitive but requires extensive preprocessing and can cause data redundancy when business changes.
Data Vault Model – emphasizes integration and historical tracking; flexible but results in many tables and complex joins.
After evaluating actual business requirements, the dimensional model was chosen for its balance of usability and adaptability.
Data Warehouse Architecture
The overall architecture adopts a four‑layer structure, each with specific functions and modeling rules.
Layered Design
1) I (Inbound) Layer
Function: Raw data backup from source systems, no additional logic.
Modeling: Retention period defined by business needs; data ingested via full or incremental ETL; tables are short‑lived.
2) C (Consolidation) Layer – Core Layer
Function: Build technical‑topic models to flexibly support business needs while shielding downstream layers from source changes.
Modeling: Topic‑based partitioning, data cleansing, and dimensional modeling.
This core layer reduces development cost and isolates upstream changes compared with traditional ODS‑DW‑APP three‑tier architecture.
3) S (Subject) Layer
Function: Design analysis‑oriented subject models for detailed queries and metric calculations.
Modeling: Business‑process‑driven, using wide dimension tables or metric tables.
4) R (Report) Layer
Function: Store pre‑aggregated dimensional metrics for fast decision‑making.
Modeling: Decision‑oriented, small data volume, supports quick queries.
Data Quality Monitoring
A monitoring system provides automated validation, real‑time alerts, scoring, and weekly quality reports to ensure correct output.
Metadata Management
A metadata management system reduces development and maintenance costs by enforcing naming conventions, table definitions, and consistency checks.
Naming Mechanism – standardized roots for dimensions and metrics; fields must use these roots, ensuring uniform naming across tables.
Audit Mechanism – new roots or fields undergo multi‑level review.
Responsibility Mechanism – accountability assigned to individuals.
Consistency Assurance – one‑click table creation syncs metadata with physical tables; daily consistency checks trigger alerts.
Naming and Coding Standards
Field Naming
Use snake_case, lowercase letters and digits; no leading digits or double underscores with only digits.
All field names must come from the metadata field‑naming library; reserved words are prohibited.
Names are constructed from approved roots.
Table Naming
Prefix indicates layer (i, c, s, r) and type (t for table, v for view).
Data category: f (fact), d (dimension), s (snapshot).
Format examples: itf_usr_db58_customercrm_customer (I‑layer table) and cvf_ord_order__car (C‑layer view).
Coding Conventions
SELECT fields listed one per line, commas placed before field names.
AS clause mandatory and aligned with its field.
Indentation rules for clauses, CASE‑WHEN alignment, line length ≤250 characters.
After writing, run PL/SQL formatter then fine‑tune.
Summary and Outlook
A reasonable data‑warehouse architecture has been built, enabling rapid data integration and reducing impact of business changes.
Initial business knowledge base established, lowering data‑usage cost.
Metadata management framework created, covering core data‑analysis scenarios.
Future work will focus on:
Improving usability of S and R layers to further reduce user effort.
Enhancing data stability and timeliness by continuously optimizing data jobs.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
