Big Data 21 min read

How Flink + Hologres Power Real‑Time Streaming Warehouses

This article explains how combining Flink with Hologres creates a unified, real‑time streaming warehouse, detailing traditional layering approaches, the advantages of the Hologres‑based solution, core capabilities like Binlog and resource isolation, and a practical e‑commerce case study demonstrating performance gains.

Alibaba Cloud Big Data AI Platform

Jul 12, 2024

How Flink + Hologres Power Real‑Time Streaming Warehouses

Traditional Real‑Time Warehouse Layering Solutions

1. Flink + Kafka Layering Scheme

Data is processed in real time by Flink, then written to a key‑value engine for queries. Flink and Kafka interact repeatedly to build ODS, DWD, DWS layers. While the hierarchy is clear, this approach incurs heavy data synchronization, high resource consumption, complex pipelines, limited schema evolution, and poor Kafka data visibility.

2. Scheduled Layering Scheme

Flink writes data to the real‑time warehouse forming the DWD layer; high‑frequency scheduling (e.g., minute‑level) builds DWS and ADS layers, providing near‑real‑time updates. This reduces cost compared to Flink+Kafka but adds latency, turning the real‑time warehouse into a near‑real‑time one.

3. Materialized View Layering Scheme

Flink writes data to the warehouse, then materialized views generate DWS/ADS layers. While queries benefit from high QPS point lookups, materialized view support is immature, often batch‑oriented, and cannot meet strict real‑time requirements.

Real‑Time Warehouse Hologres + Flink Layering Scheme

This solution addresses the shortcomings of traditional approaches by using Flink together with Hologres.

1. Introduction to Hologres

Hologres is a unified, real‑time, elastic data‑warehouse engine supporting OLAP, ad‑hoc, point queries, and vector computation. It breaks TPC‑H records, offers row‑column coexistence for high‑throughput writes and updates, and provides high‑availability architectures for load isolation.

2. Flink + Hologres Streaming Warehouse Scheme

Data flows from MySQL (or other sources) through Flink into Hologres ODS layer. Hologres generates Binlog, serving as Flink source tables; Flink consumes and writes back to Hologres forming DWD, then DWS layers, all instantly queryable. This eliminates Kafka, simplifies pipelines, and enables direct downstream services.

3. Deep Integration of Flink and Hologres

Hologres serves as Flink source tables (via Binlog), dimension tables (row‑column coexistence), result tables (high‑throughput writes and partial updates), and provides a unified catalog for metadata and schema evolution.

Hologres + Flink Enterprise Real‑Time Warehouse Core Capabilities

1. Hologres Binlog

Similar to MySQL Binlog, it records insert, delete, and update events, providing incremental change data for real‑time processing across warehouse layers, enabling stateful event‑driven development.

2. Row‑Column Coexistence

Tables store both row and column formats, supporting high‑performance Binlog consumption, dimension table joins, and versatile OLAP or point‑query workloads.

3. Resource Isolation

Hologres uses compute groups (Warehouses) to separate resources for offline writes, real‑time writes, OLAP queries, and online services, achieving full load isolation and elastic scaling.

E‑Commerce Platform Streaming Warehouse Practice

1. Business Background and Architecture

A typical e‑commerce platform syncs MySQL orders, payments, and product catalog to Hologres ODS via Flink Catalog, then builds DWD and DWS layers for real‑time aggregation and downstream services.

2. ODS Real‑Time Sync

Flink Catalog enables full‑library real‑time sync to Hologres, supporting automatic schema evolution and incremental updates.

3. DWD Real‑Time Sync

Flink reads ODS Binlog, performs multi‑stream joins and dimension table associations, and writes a wide DWD table back to Hologres using partial updates (Fixed Plan) for high‑throughput.

4. DWS Real‑Time Aggregation

Binlog from DWD is consumed by Flink to compute real‑time metrics, producing user‑ and merchant‑level aggregate tables in DWS.

5. Building Data Applications

Downstream services query DWD for detailed reports or DWS for point‑lookups (e.g., recommendation engines), achieving millisecond‑level latency.

Customer Case

37手游 adopted Flink + Hologres to unify its real‑time and offline warehouses, eliminating complex ETL pipelines, reducing latency to milliseconds, supporting schema evolution without job restarts, and improving query performance by over 100%.

The integrated architecture delivers instant data visibility, million‑level updates with sub‑second delay, simplified operations, and enables diverse scenarios such as acquisition optimization, reporting, fine‑grained operations, user profiling, and intelligent diagnostics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Big Data Flink Hologres Streaming Warehouse

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.