Big Data 12 min read

Cutting Compute Costs with MaxCompute Materialized Views: Strategies and Results

This article details how MaxCompute leverages fuzzy materialized views, DAG scheduling adjustments, public layer mining, and FBI acceleration techniques to reduce compute resource consumption by up to 10%, improve task visibility, and achieve significant daily savings in large‑scale data warehouse environments.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Cutting Compute Costs with MaxCompute Materialized Views: Strategies and Results

01 In-Cluster MaxCompute Materialized View Practice

In MaxCompute, we prioritize fuzzy materialized views over precise ones because they deliver higher overall benefit by merging similar views, reducing computation repetitions, and enabling downstream task reuse.

Fuzzy views cannot use AutoMV, so we must manually schedule their creation during the job scheduling phase in DataWorks, ensuring downstream nodes (e.g., nodes E, F, G) can consume the view after upstream tasks complete.

When a fuzzy view is recommended, it typically represents a common query shared by multiple downstream jobs; creating a single view upstream of those jobs satisfies the prerequisite for all of them.

By adjusting the DAG to insert the materialized view node, we guarantee the view runs before downstream tasks without delaying overall pipeline execution; in many cases it even speeds up processing because downstream jobs avoid redundant computation.

The design also improves user visibility of task changes compared to AutoMV, which hides optimizations in the execution plan.

From a cost perspective, fuzzy views incur extra compute before they are used, unlike AutoMV which is zero‑cost; therefore we must balance performance gains against added resource consumption.

We evaluate view benefit with (reuse count‑1) × compute cost – storage cost; to achieve net savings, a view should serve at least two downstream tasks. To reduce storage overhead, we implement a polling mechanism that recycles a view once all downstream jobs finish, similar to cache eviction.

Additional safeguards include timeout killing, forced success on errors, and conflict‑aware task protection to ensure stable execution.

Since April 2023, materialized views have been deployed across 114 spaces and 15 BUs within Alibaba, covering roughly 70‑80% of compute‑intensive tasks, optimizing about 450 k CU and saving ~4 × 10⁴ CU daily, yielding a 9‑10% overall efficiency gain.

02 Materialized View – Public Layer Mining

Beyond compute savings, we explore persisting materialized views as a public intermediate layer (similar to DWD/DWS) to share reusable logic across teams. However, divergent data quality requirements make a unified public layer challenging.

When materialized view results are frequently recycled, downstream tasks that need longer‑lived data may miss the benefit, so we identify scenarios where a view should be promoted to a permanent table: high downstream reuse, complex SQL, large UDFs, etc.

When a view originates from DWD/DWS layers and contains aggregations or complex UDFs, persisting its SQL as a public table yields greater returns, gradually converging the public layer.

03 Materialized View – FBI Acceleration Scheme

We aim to apply materialized views to online reporting (FBI) scenarios. Offline‑prepared datasets are submitted as SQL jobs by the reporting platform rather than through the scheduler.

By requesting all recommended views and decomposing them into a virtual DAG, we can construct a pipeline where each view is computed once and cached, accelerating report generation.

When a new query arrives, the system dynamically determines required upstream views, stores them in the database, and reuses them, achieving fast report response.

MaxComputematerialized viewCompute cost
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.