Big Data 10 min read

Real-Time Advertising Data Warehouse Architecture Based on Flink

This article presents a comprehensive design of a real-time advertising data warehouse powered by Flink, covering construction background, technical and data‑warehouse architecture, real‑time OLAP, stability and data‑quality guarantees, future plans, and the integration of Hologres for simplified processing.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Real-Time Advertising Data Warehouse Architecture Based on Flink

Abstract: Real‑time data warehouses aim to provide low‑latency metrics for business decisions. This article introduces the construction of an advertising real‑time data warehouse based on Flink, outlining background, technical architecture, warehouse layers, real‑time OLAP, assurance mechanisms, and future directions.

Construction Background: Traditional advertising optimization relies on next‑day (T+1) metrics, which cannot meet the rapid adjustment needs during large‑scale promotional events. Real‑time data pipelines are required to adjust audience, region, and bidding strategies instantly.

Technical Architecture: Leveraging the next‑generation stream processing engine Flink for its high performance, data consistency, and SQL‑style programming, the system ingests server logs and MySQL change data via Kafka, synchronizes offline data to HBase using DataX, processes streams in Flink (parsing, normalization, widening, aggregation), and stores results in HBase (key‑value) and Hologres (columnar) for different query scenarios.

Data Warehouse Architecture: The warehouse follows a layered model similar to offline designs but with fewer layers to reduce latency. It consists of:

Data Source Layer: DB logs (advertiser, campaign metadata) and server logs (exposure, click, user behavior).

Middle Layer: DIM (dimension) layer built from DB logs via full and incremental loads; DWD (detail) layer merges search and recommendation logs and pre‑joins dimension tables.

Application Layer: Real‑time dashboards, merchant back‑office metrics, real‑time features, and multi‑dimensional analysis, each exposing exposure, click, and spend data.

The real‑time design reduces hierarchy compared to offline warehouses and performs dimension integration in the DWD layer to avoid downstream joins.

Real‑Time OLAP: Addresses two main challenges: (1) rapidly changing operational data requirements, which make Flink pre‑computation costly; (2) the need for point‑in‑time analysis of mutable MySQL data (e.g., budget updates). The proposed OLAP architecture writes detailed data processed by Flink into an OLAP store, enabling online queries and offline batch processing for layered results.

Real‑Time Assurance:

Stability Assurance: Conduct pre‑stress testing for peak traffic, define task priority levels, and monitor failover, checkpoint, GC, and back‑pressure metrics.

Data Quality Assurance: Ensure correctness via idempotent writes and deduplication (e.g., row_number), and maintain consistency between offline and real‑time pipelines. Guarantee timeliness through stress testing and continuous progress monitoring.

Future Planning: Introduce a real‑time DWS layer to consolidate common summary metrics, reducing duplicate calculations and resource waste across merchant dashboards and feature pipelines.

Deep OLAP Applications: Extend OLAP usage beyond operations to merchant dashboards, leveraging native sorting and pagination capabilities to replace costly in‑memory post‑processing.

Hologres HASP Architecture: Hologres, Alibaba’s interactive analytics product, implements Hybrid Serving/Analytical Processing (HASP) with row‑store for KV queries and column‑store for multi‑dimensional analysis, allowing HBase to be replaced and simplifying the overall data‑processing stack.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkData QualityHologresOLAP
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.