Big Data 11 min read

How MaxCompute 4.0 Redefines Big Data with Open Architecture & AI Integration

The article outlines MaxCompute's evolution through three phases, highlights five emerging data‑warehouse trends, and details the MaxCompute 4.0 open, real‑time, cost‑effective architecture that unifies storage, compute, and AI to boost performance and developer productivity.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How MaxCompute 4.0 Redefines Big Data with Open Architecture & AI Integration

Speaker: Zhang Zhiguo, Alibaba Cloud Intelligent Computing Platform researcher and MaxCompute lead.

Topic: MaxCompute architecture upgrade and open‑source interpretation.

MaxCompute has progressed through three stages: MaxCompute 1.0 focused on massive data‑processing capability; MaxCompute 2.0 emphasized serverless elasticity and cost‑effectiveness; MaxCompute 3.0 targets integration of lake‑warehouse, offline‑online convergence, and more.

The evolution can be viewed across five dimensions:

Data volumes continuously grow as data‑driven businesses expand.

Workloads diversify, covering structured, semi‑structured, and unstructured data, with AI raising warehouse requirements.

Real‑time and timeliness demands increase, requiring large‑scale streaming ingestion and real‑time warehousing.

Data accuracy expectations rise, prompting large‑scale governance and quality control.

AI‑driven decision making pushes for higher value extraction from existing data.

In response, Alibaba Cloud proposes MaxCompute 4.0 open‑integrated architecture, enhancing near‑real‑time processing, openness, cost‑effectiveness, and Data+AI integration.

MaxCompute 4.0 features high‑concurrency, real‑time streaming ingestion; data can be stored in MaxCompute’s native Pangu system or exported to OSS with AliORC as the internal format. A unified language data‑management service governs both native and open storage (OSS, HDFS) via a single API, enabling diverse compute engines to access data without copying.

Open Storage and Compute Architecture

The open storage layer exposes native data formats through open‑source memory formats for various compute engines. The open compute layer allows the built‑in engine to efficiently access data lakes via a unified API.

MaxCompute 4.0 also introduces a near‑real‑time processing framework that achieves “one data, one code” for low‑cost, low‑maintenance batch and streaming pipelines, unifying real‑time and offline workloads under a single language data‑management model.

Cost‑Performance Enhancements

Performance improvements include adaptive SQL engine optimizations, better shuffle efficiency, and materialized‑view recommendations. Cost reductions stem from tiered storage based on data temperature, columnar JSON compression, elastic resource scheduling, and intelligent warehouse automation, achieving up to 50% CU price cuts.

Elastic resource scheduling matches idle capacity with workload demand, offering quota recommendations based on historical usage.

Adaptive SQL engine performs stage‑level optimizations and dynamic operator tuning.

Storage optimization with proprietary AliORC format delivers 2‑6× read/write speed and ~30% better compression than Parquet/ORC.

Intelligent warehouse provides closed‑loop automatic optimizations for storage and compute based on historical data.

Data+AI: One Env + One Data + One Code

MaxCompute 4.0 bridges the gap between big‑data and AI development by offering a Python‑based notebook environment and the MaxFrame distributed computing framework, fully compatible with Pandas APIs. A single line of code converts native Pandas to MaxFrame, enabling seamless data management, large‑scale analysis, and ML workflow integration.

These advancements aim to deliver open data access, inclusive compute engines, and support for diverse real‑time and incremental processing scenarios, with ongoing development in open architecture, incremental processing, and Data+AI.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

MaxComputeopen architecture
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.