How a Hybrid Data Warehouse Transformed Banking Data Services
This article details the 2015 hybrid data‑warehouse design implemented at Guangdong Huaxing Bank, explaining its real‑time, historical, and archival layers, the data‑bus concept, and how mixing in‑memory, relational, and Hadoop technologies addressed modern banking data‑volume, latency, and unstructured‑data challenges.
1. Data Application Development Trends
Traditional data‑warehouse projects in banks use systems like Teradata or Greenplum to consolidate transaction data for reporting and analysis, but the rise of internet finance has flooded banks with massive structured and unstructured data, demanding faster, more flexible data services.
Huaxing Bank therefore proposed a hybrid data‑warehouse architecture that expands the warehouse into a unified data‑service center, integrating in‑memory databases, relational databases, and Hadoop to meet low‑cost, secure, agile, and automated requirements.
2. Data Warehouse Capability Design
The hybrid warehouse must support four core capabilities:
Real‑time data sharing : expose up‑to‑date asset‑liability views to channels such as online and mobile banking via APIs.
Batch data acquisition : provide daily bulk data files for reporting systems.
Historical data query : allow access to static data and archived records since the bank’s inception.
Unstructured data handling : store and query files, images, video, and audio.
Additional design requirements include:
Data time range covering real‑time (T day) and historical (up to 3 years) data.
Performance targets: millisecond‑level response for real‑time queries, minute‑level for historical analysis.
Support for both structured and unstructured data types.
Both real‑time message queries and batch file queries.
Strict adherence to data standards for consistency.
3. Overall Architecture of the Hybrid Data Warehouse
The architecture consists of four modules: Real‑time Data Warehouse, Historical Data Warehouse, Archive Data Warehouse, and a Data Bus that provides a unified query interface for all modules.
4. Detailed Design and Implementation
4.1 Real‑time Data Warehouse
Uses Redis (a key‑value in‑memory database) for transaction details and lightweight aggregates, with MySQL for end‑of‑day reconciliation. Redis runs in a master‑slave cluster with AOF persistence for crash recovery. The application layer provides data‑service pools, transaction handling, batch processing, SQL adapters, and an admin console.
Key‑value model: each relational row becomes a Redis key like Detail:Table:PK1||PK2 whose value is a hash of column‑value pairs. Indexes are stored as separate keys, e.g., Index:Table:PK:Field:Value for strings and sorted‑sets for numeric fields.
4.2 Historical Data Warehouse
Built on traditional warehouse technology (e.g., Oracle RAC) and follows a four‑layer model because data standards are applied in source systems. ETL processes are scheduled centrally, and the warehouse stores standardized data for multi‑year analysis.
4.3 Archive Data Warehouse
Stores data older than three years and all unstructured assets (documents, audio, video). Implemented on a Hadoop cluster to provide storage, file handling, and distributed query capabilities.
4.4 Data Bus
The Data Bus acts like an Enterprise Service Bus, exposing unified APIs for real‑time, historical, and archive queries. It includes external service interface, security, service control, messaging, data access, traffic control, and common configuration modules.
5. Applications of the Hybrid Data Warehouse
Implemented in 2015‑2016, the hybrid architecture delivered a real‑time data bus, real‑time warehouse, and historical warehouse. Standardized data conversion at source systems simplified downstream development, reduced impact of source‑system schema changes, and lowered project complexity.
In mobile banking, the real‑time warehouse powers an instant full‑asset view for customers, enabling rapid balance updates without overloading core systems.
The hybrid approach expands data‑warehouse services beyond post‑transaction reporting to support pre‑ and in‑transaction data needs, thereby increasing the overall value of banking data.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
