Big Data 22 min read

How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing

This article reviews Alibaba's MaxCompute evolution from ODPS to a unified, multi‑cluster big‑data platform, detailing its architecture, development tools, large‑scale deployments, performance optimizations, typical workload scenarios, and why it is the preferred choice for enterprise data processing.

Alibaba Cloud Developer
Alibaba Cloud Developer
Alibaba Cloud Developer
How Alibaba’s MaxCompute Became the Backbone of 99% Data Processing

Overview of Alibaba Cloud Big Data Computing Service

MaxCompute, formerly known as ODPS, is Alibaba's internal unified big‑data platform that has evolved into the core data storage and compute engine for nearly 99% of the company's data and 95% of its compute capacity.

Every day more than 14,000 internal developers use the platform, executing over three million jobs covering use cases such as Alipay credit scoring, Taobao merchant billing, and the massive traffic handling of Double‑11.

The platform runs on tens of thousands of servers across multiple regions, offering multi‑cluster disaster‑recovery, rapid user growth (250% annual increase), and deployments on both public and private clouds for government, security, and city‑brain projects.

Technical Architecture

At the lowest layer is the compute engine, connected to a data bus called DataHub that ingests data into MaxCompute. Above the compute layer are development suites like DataWorks and MaxCompute Studio, providing data management, job development, and management capabilities.

Integrated services include AI platforms, voice‑to‑text, OCR, machine translation, and intelligent brain products, which together form a complete data processing ecosystem serving both internal Alibaba services and external customers.

Evolution of Alibaba’s Data Platform

Initially, Alibaba relied on Oracle clusters (the "Oracle peak") and later introduced Greenplum as a secondary solution when Oracle reached its limits. By 2009, the need for a more scalable system led to the launch of Alibaba Cloud, which built three core components: the distributed storage system Pangu, the scheduler Fuxi, and the big‑data service ODPS (now MaxCompute).

After a year of development, the first ODPS platform was operational. By 2012 it achieved stable unified storage, standardization, and security, and in 2013 it entered large‑scale commercial use, breaking the 5,000‑node barrier and supporting multi‑cluster capabilities.

In 2014‑2015, Alibaba unified its two parallel data‑processing stacks (cloud‑ladder 1 based on open‑source Hadoop and cloud‑ladder 2 self‑developed) through the "Moon Landing" project, emphasizing multi‑cluster ability, strong security, and petabyte‑scale processing with financial‑grade stability.

Guarantee Hadoop‑compatible functionality and performance.

Provide programming‑model compatibility.

Offer comprehensive migration and comparison tools.

Enable seamless in‑flight upgrades.

Moon Landing Project – Unified Process

The project consolidated dozens of disparate computing platforms across business units into a single, unified data platform, improving resource utilization, data flow efficiency, and overall operational cost.

Key outcomes include:

Enterprise‑wide unified big‑data platform with EB‑scale storage and millions of daily tasks.

Fine‑grained security and multi‑tenant data protection.

High performance, comprehensive data unification, and optimized storage tiers (memory, SSD, HDD, cold storage).

MaxCompute 2.0 – Ongoing Upgrades

Introduced at the 2016 Cloud Expo, MaxCompute 2.0 added a new SQL engine, unstructured data processing, and support for multiple compute modes (batch, interactive, in‑memory, iterative). A forthcoming query language, NewSQL, combines declarative and imperative features.

Engine improvements include cost‑based and history‑based optimization, fully asynchronous I/O, bubble‑based scheduling for efficient resource usage, and tighter integration with Hadoop and Spark ecosystems.

Storage enhancements feature AliORC (compatible with native ORC but faster) and hierarchical tiered storage, with SSD, HDD, and cold‑storage layers.

Typical Big‑Data Workloads

Workloads are categorized into three main types:

Batch/Workflow: Periodic jobs (daily, hourly, monthly) handling large data volumes.

Interactive Analysis: Ad‑hoc queries for business decisions, requiring low latency (seconds to tens of seconds) and moderate data size.

Streaming/Real‑time: Low‑latency processing for events such as Double‑11 dashboards.

Key technical considerations include data ingestion throttling, integrity checks, fault‑tolerant data补 (recovery), real‑time debugging, and high‑availability scheduling.

BI‑Focused Optimizations

For interactive BI scenarios, MaxCompute employs online‑job designs featuring long‑living processes, process reuse, direct network connections (avoiding disk I/O), event‑driven scheduling, and automatic failover based on statistical models.

Performance Benchmarks

In collaboration with Intel, MaxCompute was evaluated on the 2017 BigBench benchmark, executing over 30 queries (SQL, MapReduce, machine‑learning) at scales from 10 TB to 100 TB, achieving the first engine to reach 7,000 points and demonstrating superior cost‑performance on public cloud.

Why Choose MaxCompute

Out‑of‑the‑box scalability without user‑managed sizing.

Proven performance and cost‑efficiency through extensive benchmarks.

Robust multi‑tenant security built on Alibaba’s internal safeguards.

Support for multiple distributed compute models.

Comprehensive migration tools and ecosystem integration (DataWorks, Studio, AI platforms, recommendation engines, reporting tools).

Data already on Alibaba Cloud can be migrated to MaxCompute via various methods (direct sync, VPN, dedicated lines). Once in the platform, developers can leverage Data IDE, plugins, and seamless integration with machine‑learning and analytics services.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataData Platformdata-warehouseMaxComputedistributed computingAlibaba Cloud
Alibaba Cloud Developer
Written by

Alibaba Cloud Developer

Alibaba's official tech channel, featuring all of its technology innovations.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.