Information Security 16 min read

Design and Architecture of a Full‑Chain Data Warehouse for Information Security

The article presents a comprehensive design of an end‑to‑end data warehouse for information‑security governance, detailing background motivations, multi‑layer data architecture, dimension modeling, bus‑matrix mapping, real‑time (lambda/kappa) processing, data‑dictionary integration, and future directions toward unified streaming‑batch solutions.

58 Tech
58 Tech
58 Tech
Design and Architecture of a Full‑Chain Data Warehouse for Information Security

Background – In information‑security business, massive heterogeneous data (features, policies, user behavior) must be analyzed and validated, requiring a "full‑link" data warehouse that integrates all business‑line data into a dense, highly‑integrated data mesh, turning data into proactive security production capacity.

Data Layering – The warehouse is divided into six layers:

Seq

Data Layer

Abbreviation

Purpose

1

Raw Data Layer

RAW

Snapshot of source‑system data, stored daily with full detail.

2

Basic Data Layer

ODS

Business‑concept organized data with standardized names and codes.

3

General Data Layer

DWD

Fine‑grained aggregated layer built on star or snowflake models; metrics and dimensions are standardized.

4

Aggregated Data Layer

DWS

Data marts for specific business needs, designed with star or snowflake schemas.

5

Dimension Layer

DIM

Dimension tables providing rich attributes, historical traceability, and consistency across common dimensions.

6

Temporary Layer

TMP

Transient tables to reduce computation difficulty and improve runtime efficiency.

Dimension Modeling – Two mainstream approaches (normalized vs. dimensional) are compared. Normalized warehouses require heavy upfront work but yield stable long‑term maintenance; dimensional modeling is more agile, suits frequently changing business, and demands less expertise. Four key steps are outlined: selecting business processes, declaring grain, identifying dimensions, and confirming facts.

Bus Matrix – The bus matrix acts as a map of the warehouse, linking each business process (rows) with common dimensions (columns). It provides a macro view of which processes share which dimensions, enabling quick alignment of data requirements with warehouse structures.

Overall Architecture – The warehouse is split into three logical parts:

General warehouse: stores cross‑business capability data (e.g., hunter‑risk system, cloud authentication).

Business warehouse: built for specific industry‑level analyses.

Subject warehouse: unified, cross‑business subject areas (traffic, content, user, etc.) based on consistent dimensions.

This three‑tier design mirrors the IKEA analogy: a public floor (general warehouse) for developers and a dedicated floor (business warehouse) for analysts.

Real‑Time Evolution – Discusses Lambda (batch + stream) and Kappa (stream‑only) architectures. Lambda offers flexibility but incurs double‑engine maintenance and data inconsistency; Kappa simplifies the stack by using a message queue (e.g., Kafka) and Flink, enabling stream‑to‑Hive writes and automatic small‑file compaction.

Data Dictionary – Serves as the core metadata service (Hive Metastore) that supplies schema information to streaming platforms, enabling zero‑code configuration for feature extraction, model training, and online inference.

Future Outlook – The team is exploring data‑lake‑based stream‑batch integration to replace the current Hive + Kafka pattern, and addressing emerging security challenges such as unstructured image/text attacks, requiring new data‑structuring and linkage solutions.

big dataReal-time Processingdata warehouseinformation securityDimension Modeling
58 Tech
Written by

58 Tech

Official tech channel of 58, a platform for tech innovation, sharing, and communication.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.