Big Data 11 min read

How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era

The article details Alibaba Cloud's MaxCompute transformation into an AI‑native data warehouse, highlighting its serverless elasticity, multimodal data management, unified model lifecycle, AI Function integration, and new distributed Python engine that together address the bursty, high‑complexity data and compute challenges of the generative AI era.

Alibaba Cloud Big Data AI Platform

Oct 15, 2025

How MaxCompute’s AI‑Native Data Warehouse Redefines Big Data for the Generative AI Era

This article summarizes the MaxCompute announcement at the 2025 Cloud Conference, where Alibaba Cloud unveiled a major upgrade that transforms MaxCompute into an AI‑native data warehouse for the generative AI era.

The AI era brings massive, multimodal, high‑complexity data and bursty compute demands, exposing pain points such as fragmented unstructured data, split development workflows between SQL‑based ETL and Python‑based modeling, insufficient elastic compute, and weak engineering operations.

MaxCompute addresses these with a unified “Data+AI” architecture that provides a unified data foundation, heterogeneous compute scheduling, and model‑data fusion, forming four core directions of the AI‑native warehouse.

Serverless remains the core, offering a shared compute pool with on‑demand, pay‑as‑you‑go usage, auto‑scaling, and full GPU support; tests show 100 k CU can be launched in 10 seconds.

Multi‑modal data management is realized through Object Table , which maps files in OSS (images, audio, video, PDF) to table objects for unified metadata, and the Blob type that stores unstructured content alongside structured fields, enabling mixed‑storage rows and schema‑on‑read for lake assets.

Unified AI model management adds a full model lifecycle, supporting public models (e.g., Qwen‑3, DeepSeek‑R1‑Distill‑Qwen), user‑uploaded and remote models, version control, and seamless invocation from both SQL and Python. Example SQL snippet: CREATE MODEL ... AI Function packages large‑model inference as programmable functions (e.g., AI_EXTRACT, AI_TRANSLATE) that run on CPU or GPU, allowing SQL or Python jobs to call LLM capabilities without deployment or tuning.

Core technical breakthroughs include an optimized SQL engine for nested types (STRUCT, ARRAY) with columnar storage and UNNEST operator, Auto Partition for precise pruning, Delta Table as a unified lake table with upserts and incremental materialized views, MaxQA for interactive query acceleration, and the new Distributed Python Engine (DPE) built on Ray that enables scalable Pandas‑compatible scripts while sharing metadata with the SQL engine.

Scenario applications show customers using MaxCompute for large‑scale video frame preprocessing, ROS bag parsing for autonomous driving, and rapid contract or medical record extraction via AI Function, achieving multi‑fold efficiency gains without managing separate clusters.

Future outlook positions MaxCompute as the trusted data foundation and intelligent decision engine for enterprises, fully integrating data and AI to unlock, create, and amplify value.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

multimodal distributed Python AI-native

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.