Cutting Compute Costs with MaxCompute Materialized Views: Strategies and Results
This article details how MaxCompute leverages fuzzy materialized views, DAG scheduling adjustments, public layer mining, and FBI acceleration techniques to reduce compute resource consumption by up to 10%, improve task visibility, and achieve significant daily savings in large‑scale data warehouse environments.
01 In-Cluster MaxCompute Materialized View Practice
In MaxCompute, we prioritize fuzzy materialized views over precise ones because they deliver higher overall benefit by merging similar views, reducing computation repetitions, and enabling downstream task reuse.
Fuzzy views cannot use AutoMV, so we must manually schedule their creation during the job scheduling phase in DataWorks, ensuring downstream nodes (e.g., nodes E, F, G) can consume the view after upstream tasks complete.
When a fuzzy view is recommended, it typically represents a common query shared by multiple downstream jobs; creating a single view upstream of those jobs satisfies the prerequisite for all of them.
By adjusting the DAG to insert the materialized view node, we guarantee the view runs before downstream tasks without delaying overall pipeline execution; in many cases it even speeds up processing because downstream jobs avoid redundant computation.
The design also improves user visibility of task changes compared to AutoMV, which hides optimizations in the execution plan.
From a cost perspective, fuzzy views incur extra compute before they are used, unlike AutoMV which is zero‑cost; therefore we must balance performance gains against added resource consumption.
We evaluate view benefit with (reuse count‑1) × compute cost – storage cost; to achieve net savings, a view should serve at least two downstream tasks. To reduce storage overhead, we implement a polling mechanism that recycles a view once all downstream jobs finish, similar to cache eviction.
Additional safeguards include timeout killing, forced success on errors, and conflict‑aware task protection to ensure stable execution.
Since April 2023, materialized views have been deployed across 114 spaces and 15 BUs within Alibaba, covering roughly 70‑80% of compute‑intensive tasks, optimizing about 450 k CU and saving ~4 × 10⁴ CU daily, yielding a 9‑10% overall efficiency gain.
02 Materialized View – Public Layer Mining
Beyond compute savings, we explore persisting materialized views as a public intermediate layer (similar to DWD/DWS) to share reusable logic across teams. However, divergent data quality requirements make a unified public layer challenging.
When materialized view results are frequently recycled, downstream tasks that need longer‑lived data may miss the benefit, so we identify scenarios where a view should be promoted to a permanent table: high downstream reuse, complex SQL, large UDFs, etc.
When a view originates from DWD/DWS layers and contains aggregations or complex UDFs, persisting its SQL as a public table yields greater returns, gradually converging the public layer.
03 Materialized View – FBI Acceleration Scheme
We aim to apply materialized views to online reporting (FBI) scenarios. Offline‑prepared datasets are submitted as SQL jobs by the reporting platform rather than through the scheduler.
By requesting all recommended views and decomposing them into a virtual DAG, we can construct a pipeline where each view is computed once and cached, accelerating report generation.
When a new query arrives, the system dynamically determines required upstream views, stores them in the database, and reuses them, achieving fast report response.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
