How MaxCompute Evolves Data Platforms for AI: Architecture, Features, and Real‑World Cases
The article explains how Alibaba Cloud's MaxCompute transforms a traditional data warehouse into a cloud‑native, multimodal Data+AI platform by introducing a four‑layer architecture, SQL‑based AI functions, the Python‑native MaxFrame framework, and a series of industry case studies that demonstrate performance gains and flexible resource scheduling.
MaxCompute, Alibaba Cloud's core big‑data compute platform, is being re‑engineered for the AI era. Its architecture is divided into four layers—data, model, compute, and engine—each addressing specific AI requirements.
Data Layer
The platform stores both structured and unstructured data, supporting multimodal formats (audio, video, images) via a BLOB field type. It connects to external storage engines such as OSS and Hologres through Object Table and other APIs, enabling unified metadata management without moving data.
Model Layer
MaxCompute hosts traditional machine‑learning models (XGBoost, LightGBM) and open‑source large models (Qwen, DeepSeek‑R1‑Distill‑Qwen). It also integrates commercial flagship models from the Bailei platform, providing a single point for model registration, versioning, and serving.
Compute Layer
Hybrid CPU/GPU scheduling is offered, allowing users to declare required resources declaratively. This meets the heavy compute demands of multimodal AI workloads.
Engine Layer
Two primary compute interfaces are provided:
SQL Engine : The SQL AI function lets analysts invoke large models directly from SQL for offline inference, lowering the barrier for AI adoption.
MaxFrame : A native Python distributed‑computing framework compatible with Pandas, XGBoost, LightGBM, and other open‑source libraries. MaxFrame runs on MaxCompute’s massive compute resources and integrates tightly with DataWorks, custom Docker images, and the MaxCompute Notebook.
Development Experience
Developers can install MaxFrame locally via pip install maxframe and work in VS Code or Jupyter. DataWorks Notebook offers a Magic Command to start/stop MaxFrame sessions. The platform also supports PyODPS3 for job submission and provides stable, interactive development through deep integration with DataWorks.
Key Use Cases
Large‑model data preprocessing : A leading LLM provider processed petabyte‑scale data with a 300 k‑core job, achieving >50 % performance improvement for MinHash operators and elastic scaling up to 1.6 M cores, far exceeding the 1 M‑core requirement.
Automotive embodied‑intelligence : Using MaxFrame, a customer handled multimodal sensor data (images, video, radar, GPS) with a 40 %+ speedup over single‑node Python pipelines, thanks to elastic resource allocation and distributed processing.
Multimodal image labeling : By invoking the SQL AI function and MaxFrame’s built‑in AI Function, the platform performed automatic image tagging and embedding generation for downstream retrieval, integrating large‑model inference directly on stored multimodal tables.
Conclusion
MaxCompute delivers an end‑to‑end Data+AI capability that spans storage, model management, compute, and engine layers. Its cloud‑native, elastic, and high‑performance design enables enterprises—from large‑model providers to autonomous‑driving firms—to build AI data assets and deploy intelligent applications at scale.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
