Big Data 20 min read

How Hologres Powers Real‑Time Data Warehousing and Analytics in the Cloud

Hologres, Alibaba Cloud’s real‑time data warehouse, combines storage‑compute separation, high‑performance OLAP, instant write‑to‑query, serverless computing, and advanced indexing to enable low‑latency analytics, seamless data integration, and scalable, cost‑effective solutions across diverse use cases such as recommendation, monitoring, and advertising.

Alibaba Cloud Big Data AI Platform

Jun 20, 2024

How Hologres Powers Real‑Time Data Warehousing and Analytics in the Cloud

Product Positioning

With the advancement of technology, big data is shifting from batch processing to real‑time processing. Users no longer accept the traditional T+1 analysis and demand higher timeliness for computation and analysis, such as real‑time dashboards, city‑brain traffic monitoring, risk control, and personalized recommendations across industries.

In typical scenarios, data is synchronized from transaction systems and logs to the warehouse in real time. Detailed data is processed instantly for interactive analysis in applications and BI, while aggregated data is also stored offline for layered warehousing. Real‑time data is joined with dimension tables, aggregated, and written to KV stores to serve recommendation systems.

Real‑time data has become an online business system comprising batch, stream, OLAP analysis, and KV detail queries, supporting mixed workloads. Data processing follows three main pipelines: offline data archiving to MaxCompute/Hive, real‑time writes to OLAP systems such as ClickHouse/Druid, and stream processing with Flink that aggregates data and writes to KV databases. Event streams may be widened with KV‑stored dimension attributes before aggregation.

Businesses often need to query both historical and real‑time data for comparison, e.g., comparing this year’s Double‑11 sales with last year’s. Federated query systems can be limited by the slowest source, so results are often cached in MySQL or Redis for reporting.

Building a real‑time warehouse involved more than ten data technologies, resulting in a complex architecture, difficult data synchronization, high resource consumption, and data silos.

Hologres was created to address these challenges. It provides a unified storage that simultaneously serves multi‑dimensional analysis and online application dashboards, supporting high‑performance offline imports and real‑time updates with write‑to‑query capability, delivering a one‑stop real‑time ingestion solution.

Typical Hologres applications include traditional BI reporting, real‑time dashboards, data middle‑platform, user profiling, personalized marketing, real‑time traffic monitoring, network monitoring, risk control, and live‑stream monitoring.

Hologres Core Functions and Advantages

Hologres follows a storage‑compute separation architecture. Each instance’s compute layer is containerized and consists of frontend and worker nodes. The frontend handles protocol access, SQL parsing and optimization, real‑time writes, scheduling, and query planning, using a fixed‑plan technique to achieve millions of RPS writes. Worker nodes perform KV queries, OLAP queries, and read from storage such as MaxCompute or OSS, providing lake‑warehouse integration. The storage layer uses Alibaba’s self‑developed DFS (Pangu) with tiered hot‑cold storage, supporting row, column, and hybrid storage, as well as various indexes, balancing performance and cost.

Product Advantages

Real‑time OLAP analysis with high‑performance writes; writes are instantly queryable.

Columnar storage with multiple indexes (clustered, bitmap, dictionary) and vectorized engine for maximal hardware utilization.

Native primary‑key model with deduplication, full‑column updates, and partial updates.

Serving‑grade performance for millions of point queries, multi‑replica mode, automatic retry on replica failure, and physical resource isolation for read/write separation.

Lake‑warehouse interactive analysis, supporting second‑level queries on MaxCompute and external tables, with automatic table discovery and no manual DDL.

PostgreSQL compatibility, supporting ecosystem tools, extensions, and spatial analysis.

Key Features

Hologres leads the TPC‑H benchmark with a score of 27.86 million, demonstrating top‑tier OLAP performance.

Serverless Computing allows large tasks (INSERT/UPDATE/DELETE) to run in a fully managed pool, isolating resources per query, charging by compute usage and duration, eliminating the need to reserve resources for periodic large jobs.

Resource isolation is achieved through compute groups within a shared‑storage instance, providing physical isolation, fault isolation, elastic scaling, and high availability without query interruption.

Fixed‑Plan write path bypasses the optimizer, coordinator, and execution engine, reducing latency to milliseconds and achieving million‑plus RPS writes with support for row, column, and hybrid storage.

Hologres parses JSON at ingest, storing keys and values in columnar format, enabling efficient compression, reduced storage cost, and fast bitmap‑indexed queries on semi‑structured data.

Cold‑hot tiered storage is configurable per table, automatically moving data between tiers without impacting queries.

Binlog capability turns Hologres into a data source, recording row‑level changes (INSERT, DELETE, BEFORE UPDATE, AFTER UPDATE) that can be consumed by JDBC or Flink for real‑time data pipelines and CDC.

Rich analytical functions support funnel, interval‑funnel, path, and retention analysis, enabling complex Sankey diagrams and retention metrics with orders of magnitude performance gains over traditional nested queries.

Native Roaring Bitmap and BSI functions provide high‑cardinality tag analysis for both attribute and behavior tags, delivering fast set operations and compressed storage.

Typical Cases

Case 1: Xiaohongshu (Social Commerce)

Hologres replaced a self‑built ClickHouse cluster for search and recommendation, eliminating high operational costs, extending data retention to 15 days, accelerating queries, providing built‑in deduplication, and supporting seamless schema evolution.

Case 2: Alibaba Mama (Advertising)

By integrating Flink with Hologres, Alibaba Mama achieved millisecond‑level OLAP and online services, enabling real‑time audience selection, vector‑based recall, and three‑fold development efficiency gains.

Case 3: 37 Mobile Games

After migrating to Hologres, 37 Mobile Games realized millisecond write‑to‑query latency, automatic schema evolution via Flink, eliminated redundant storage layers, doubled query performance, and simplified operations.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.