Big Data 18 min read

How MaxFrame Enables Scalable Python AI Workloads on MaxCompute

This article introduces MaxFrame, a cloud‑native distributed Python compute service built on MaxCompute, detailing its architecture, seamless integration with the Python ecosystem, and real‑world use cases ranging from large‑scale data analysis and machine learning to offline LLM inference and custom image deployments.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How MaxFrame Enables Scalable Python AI Workloads on MaxCompute

This article is based on the offline meetup talk "Data+AI Fusion Trend and Intelligent Data Warehouse Platform Construction" presented by Liu Yang, an Alibaba Cloud product expert. It introduces MaxFrame, a distributed Python computing service built on the MaxCompute ecosystem, and outlines four main topics: MaxFrame overview, the MaxCompute Python development ecosystem, key application scenarios, and customer case studies.

01

Distributed Computing Service MaxFrame

MaxFrame is a cloud‑native distributed Python compute service that supports Python programming interfaces and is fully compatible with Pandas, XGBoost, and other data‑processing and ML operators. It automatically distributes execution across MaxCompute resources while allowing direct use of MaxCompute's elastic compute and data interfaces, and integrates with MaxCompute Notebook and image management.

The overall architecture reuses MaxCompute's underlying compute resources, supporting pay‑as‑you‑go and subscription billing models. MaxFrame can read data directly from MaxCompute, eliminating complex data migration and simplifying data access for users who already store data in MaxCompute.

Key capabilities include a 100% compatible Pandas API with automatic distributed execution, eliminating local resource limits; direct in‑cluster data processing without pulling data locally, improving job efficiency; and deep integration with MaxCompute Notebook and DataWorks, providing an out‑of‑the‑box interactive development environment and offline scheduling. MaxFrame supports multiple Python versions (3.7, 3.11) and both built‑in and custom images, removing the need for complex environment preparation.

02

MaxCompute Python Development Ecosystem

The MaxCompute Python ecosystem provides a unified development environment, unified data management, and a unified programming language for the entire Data+AI lifecycle. Structured data is managed via MaxCompute Storage, while unstructured data uses OSS. The platform offers PaaS‑level capabilities such as metadata management (BigMeta), built‑in and custom image management, and two programming interfaces: SQL and MaxFrame.

From data preparation to model deployment, the ecosystem enables end‑to‑end management. In the data preparation stage, both structured and unstructured data are unified through metadata views. During analysis and preprocessing, MaxFrame delivers processing scales orders of magnitude larger than native interfaces, improving efficiency by tens of times. Model development benefits from a cost‑effective elastic compute pool and pre‑packaged environment images, while model training leverages MaxFrame's distributed execution for high‑performance large‑scale training. Model management and evaluation integrate with Alibaba's PAI platform, and deployment uses PAI‑EAS for online serving.

03

MaxCompute Key Application Scenarios

Using Pandas Distributed Operators for Simple Data Processing

MaxFrame deeply integrates with the Pandas ecosystem and optimizes it for distributed execution. A case study demonstrates filtering products sold in the spring of 2019 from two tables (Product and Sales) using MaxFrame's Pandas‑compatible API, with execution performed on MaxCompute.

Using Pandas Distributed Operators for Complex Data Processing

A more complex scenario uses custom UDFs to encapsulate intricate calculation logic, applying Pandas‑compatible operators such as apply, merge, and reset_index on employee fiscal‑year score data. MaxFrame accelerates the computation by tens of times compared to single‑node Pandas.

Custom Image Environments for Data Computation

To simplify image management, MaxCompute provides a custom image capability similar to Docker. Users build images based on MaxCompute's base images, push them to Alibaba Cloud Container Registry (ACR), and import them via the MaxCompute console for use in UDFs, such as rendering HTML pages to images using Chrome inside the custom image.

Offline LLM Inference

For large‑model offline inference, MaxFrame leverages MaxCompute's massive CPU pool (tens of thousands of cores). Using the Llama.cpp framework and a GGUF‑format model loaded via a UDF, text data stored in MaxCompute tables is processed in batch, producing logical categories as inference results.

04

MaxCompute Customer Cases

Several real‑world cases illustrate MaxFrame's impact:

Pandas Data Processing + XGBoost Offline Training and Prediction : Migrating a workload from ECS to MaxCompute reduced resource costs and eliminated data transfer bottlenecks.

Automotive Telematics Big Data Processing : Distributed Pandas and Scikit‑learn operators achieved 20‑30× performance gains over local processing.

Multimodal Processing for Autonomous Driving : Images and videos stored in OSS were processed with MaxFrame operators for labeling and downstream model training.

Large‑Model Offline Inference : Over 60 models, including a 6 GB Chinese FastText model, were inferred on 5 000 CU CPU resources, completing a 21 billion‑record job within an hour with 10 000 concurrent tasks.

Overall, MaxFrame provides a cloud‑native, scalable Python development environment that integrates tightly with MaxCompute, DataWorks, and Alibaba's AI platform, enabling end‑to‑end Data+AI pipelines without the limitations of local resources.

big dataPythonData WarehouseDistributed ComputingMaxFrame
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.