Big Data 13 min read

Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

This article analyzes the challenges of traditional data lake and warehouse architectures, explains why unified storage and compute are needed for real‑time and batch workloads, and introduces Hologres as a cloud‑native, high‑performance engine that combines PostgreSQL compatibility with Flink‑driven analytics to deliver a true real‑time data warehouse solution.

Big Data Technology Architecture
Big Data Technology Architecture
Big Data Technology Architecture
Understanding Data Lakes, Data Warehouses, and Real-Time Analytics with Hologres

Authors Jiang Xiaowei (Alibaba Cloud researcher) and Jin Xiaojun (Alibaba Cloud senior technical expert) present a technical overview of data warehouses, data lakes, and the emerging stream‑batch unified solutions, focusing on the business problems they aim to solve.

They first describe a typical real‑time business scenario where user behavior data or binlog is ingested into Kafka, processed by Flink, and enriched with dimension tables stored in HBase or Cassandra before being served to downstream applications. This pattern illustrates common machine‑learning pipelines.

The authors then discuss how increasing architectural complexity—adding more storage systems such as ClickHouse, Druid, Hive, and others—creates redundancy, high maintenance cost, and steep learning curves. They point out that point‑query stores (HBase/Cassandra) and column‑store analytical engines (ClickHouse/Druid) each require separate data copies, leading to inefficiencies.

They identify the classic Lambda architecture as the de‑facto solution, with separate speed and batch layers, but note its pain points: data duplication, costly synchronization, and difficulty scaling query QPS.

To simplify, they propose a unified architecture where both real‑time and batch data reside in a single storage system that supports diverse workloads. They argue that a truly unified store can eliminate the need for multiple isolated systems.

The discussion then shifts to data lakes, which store raw data in HDFS, OSS, or S3 and expose it via Hive, Spark, or Flink. While attractive, data lakes suffer from incremental write latency, limited query concurrency, and high resource consumption for interactive analytics.

Based on this analysis, the authors introduce the HSAP (Hybrid Service‑Analytic Platform) concept, which merges analytical and serving workloads into one system, discarding the traditional transaction‑oriented layer.

They present Hologres, Alibaba Cloud’s next‑generation real‑time interactive engine, as the concrete implementation of HSAP. Hologres combines PostgreSQL compatibility with a cloud‑native, storage‑compute separated architecture deployed on Kubernetes, supporting shared storage (Pangu, HDFS, OSS, S3) and elastic compute scaling.

Key advantages of Hologres include fully asynchronous, lock‑free high‑concurrency writes, in‑memory caching for fast queries, vectorized columnar processing, mixed‑load scheduling to protect fast queries from slow ones, and customizable query engines that outperform generic storage solutions.

Typical use cases involve Flink streaming data into Hologres for real‑time ETL or model training, after which Hologres serves as the single source of truth for online services, dashboards, and federated queries, effectively acting as a real‑time data warehouse.

The authors conclude that a true real‑time data warehouse can be built simply with Flink + Hologres, where Flink handles both stream and batch processing and Hologres provides unified storage and high‑performance query capabilities.

big dataFlinkreal-time analyticsdata warehouseHologresdata lake
Big Data Technology Architecture
Written by

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.