Big Data 16 min read

Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres

Feishu ShenNuo redesigned its data architecture by integrating Flink, Hudi, and Hologres to create a cloud‑native real‑time data warehouse that supports both millisecond‑level ad monitoring and minute‑level game operations, offering scalable storage, low‑latency queries, and comprehensive monitoring and capacity planning.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feishu ShenNuo's Real-Time Data Warehouse with Flink, Hudi, and Hologres

Background

Feishu ShenNuo Group provides end‑to‑end digital marketing services for overseas markets. To quickly respond to ad‑performance and game‑operation metrics, real‑time or near‑real‑time data processing is essential. Existing real‑time capabilities were built independently per business line using Java services, leading to poor reusability, limited scalability, and insufficient latency.

Current Architecture Analysis

The current architecture processes offline data: Kafka → Flink → MaxCompute (ODS) → ODPS‑SQL (ETL) → DW → Hologres (ADS). Scheduling runs daily or hourly. To meet real‑time requirements, the team decided to redesign the architecture, focusing on reusability, scalability, and operational cost.

Current architecture diagram
Current architecture diagram

Real‑Time Storage and Compute Selection

Real‑time latency requirements are divided into four levels. Ad‑effect monitoring needs second‑level or millisecond‑level latency, for which Hologres or Doris are considered. Game‑operation analytics can tolerate minute‑level latency and can use low‑cost lake storage with a real‑time warehouse engine for near‑real‑time performance.

Real‑Time Warehouse Product Selection

Two products were compared:

Hologres – cloud‑native, storage‑compute separated, supports independent scaling and serverless lake acceleration.

Doris – storage‑compute integrated, limited scaling.

Benchmark (TPCH 100 GB) shows Hologres is >3× faster than Doris; ad‑monitoring tests show >2× speedup.

Key limitations of Doris:

No read/write separation; performance degrades under heavy write/DDL.

MySQL protocol limits (e.g., 36‑byte primary key, 1 MB STRING).

Complex table parameters; higher query latency.

Conclusion: Hologres is chosen for its scalability, ease of use, and commercial support.

Data Lake Storage Selection

For sub‑second ad monitoring, data is stored directly in Hologres SSD. For game‑operation near‑real‑time workloads (≤2 w RPS, 10 min freshness), three lake formats were evaluated: Hudi, Delta, Iceberg. Hudi was selected for its superior batch‑update performance, Bloom‑filter support, and streaming write capabilities.

Hudi offers two table types:

Copy‑On‑Write (COW) – infrequent writes, frequent reads.

Merge‑On‑Read (MOR) – frequent updates, fewer reads.

Core advantages of Hudi:

Supports CDC streams for real‑time sync.

Enables shared ODS tables for batch and streaming.

Integrates with Presto, Trino, Spark, StarRocks, Hologres.

Schema evolution for log field extensions.

Partial updates for deduplication or back‑fill.

Hudi ODS workflow
Hudi ODS workflow

Real‑Time Compute Options

Three compute options were evaluated (Flink → Hudi → Hologres, etc.). Considering latency, resources, development cost, and complexity, the first option (Flink → Hudi → Hologres) best matches the scenario (≤2 w RPS, 10 min freshness).

This solution uses external tables or views in Hologres for real‑time processing, requiring no additional development effort from data‑warehouse engineers.

Full‑Link Monitoring

Monitoring covers data integrity, latency, and capacity planning. Alerts are sent via email or SMS. Overload protection and service isolation are configured for Hologres (1 min query timeout, read/write isolation) and Flink‑Hudi (1 min checkpoint, hot/cold storage split).

Capacity Evaluation

Performance tests show end‑to‑end latency stabilizes at ~3.5 minutes, meeting business goals. Flink baseline resources support 6 000 RPS with 60 s checkpoints; scaling resources linearly increases capacity. Hologres benchmark (32 CPU / 128 GB, 2 nodes) processes 20 million records in ~9 minutes, indicating the system can handle up to ~4 166 RPS within the 10‑minute processing window.

Future Outlook

The Flink + Hudi + Hologres architecture provides low‑cost, low‑latency real‑time data processing on Alibaba Cloud. Future improvements include leveraging Hudi CDC for higher‑speed scenarios and exploring multi‑cloud disaster‑recovery designs.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Flinkcloud-nativeHologresreal-time dataHudi
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.