Big Data 13 min read

How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data

The Dolphin engine, built by Alibaba’s Data Engine team, combines Flink and Hologres to deliver ultra‑large‑scale OLAP, streaming, batch, and AI capabilities for real‑time advertising analytics, offering smart materialization, intelligent indexing, and vector recall while supporting millions of advertisers and petabyte‑level data.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba’s Dolphin Engine Uses Flink + Hologres for Real‑Time Big Data

Alibaba’s Data Engine team developed the Dolphin engine to power advertising marketing products, supporting millions of advertisers with petabyte‑scale data, millisecond‑level interactive queries, and unified OLAP, streaming, batch, and AI computing.

The engine consists of two main components: the SQL component, which handles SQL parsing, routing, load balancing, and federated queries; and the Index Build component, responsible for smart indexing, multi‑level indexes (bitmap, time‑series), and scheduling.

Key features include smart materialization (automatically converting frequent SQL queries into materialized views), intelligent indexing (analyzing query predicates to recommend optimal indexes), and approximate computing for large‑scale data.

Currently, Dolphin runs on over 20,000 cores, processes more than 200 million daily requests, and sustains 3,000+ QPS, serving a wide range of core business scenarios such as audience selection and insight analysis.

The architecture relies on Flink for low‑latency streaming and Hologres for scalable storage, vector computation, and bitmap indexing, chosen for their high performance and extensibility.

Engine Implementation Detail 1: Solving Ultra‑Large‑Scale OLAP

To handle joins across dozens of tables and trillion‑row datasets, Dolphin leverages a bitmap indexing solution co‑built with Hologres, enabling sub‑100 ms query latency and supporting over 200 QPS for massive OLAP workloads.

Engine Implementation Detail 2: Enabling Low‑Cost Real‑Time Development

Dolphin Streaming wraps Flink and Hologres behind a simple OpenAPI‑driven Dolphin SQL interface, allowing users to submit, pause, and manage jobs without deep knowledge of Flink or Hologres, dramatically reducing development effort.

Demo 1 shows how to compute the latest 50 user behavior events in three steps: define the source table, define the output table, and write the computation logic using Dolphin SQL.

Demo 2 illustrates a real‑time debugging feature where a simple SELECT query on a registered table instantly returns the underlying data.

Business Scenario 1: Real‑Time Marketing Recommendation

By streaming merchant behavior logs into Hologres and reading features in real time, the system improves recommendation accuracy and boosts development efficiency by more than threefold.

Business Scenario 2: Vector Recall for Lookalike Audiences

Leveraging Hologres’ vector capabilities, Dolphin provides both real‑time (1000+ QPS, ~50 ms latency) and batch vector recall, enabling high‑performance lookalike targeting for advertising products.

In summary, the Flink + Hologres‑based Dolphin engine delivers high‑performance OLAP via bitmap indexing, simplifies real‑time development with Dolphin Streaming, and harnesses powerful AI vector recall, forming a tightly integrated solution for large‑scale advertising analytics.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkAIReal-time analyticsHologresvector search
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.