How Alibaba Built a World‑Class Big Data Platform Over a Decade
This article chronicles Alibaba's ten‑year journey of building and scaling its big‑data platform—from early Oracle clusters and Hadoop, through the launch of ODPS and MaxCompute, to global cloud expansion and cutting‑edge streaming innovations that now power billions of transactions each Double‑11.
After ten Double‑11 events, Alibaba's tech team reflects on a decade of big‑data platform evolution, highlighting five major phases that transformed their data processing capabilities.
Phase 1: Birth of the Data Platform (2009‑2010)
Initially using IOE systems and Oracle clusters, Alibaba launched its cloud division in 2009, developing the proprietary distributed storage “Pangu” and scheduler “Fuxi”. Early workloads were supported by Hadoop on the “YunTi‑1” platform.
Phase 2: First‑generation ODPS Engine (2010‑2012)
With rapid business growth, the ODPS engine entered production in 2010, handling massive batch jobs and real‑time analytics, though early jobs required long runtimes and overnight monitoring.
Phase 3: New Generation Platform (2012‑2015)
Key projects “Binghuoniao”, “5K Challenge”, and “Moon Landing” unified offline and real‑time processing, introduced the DataX sync tool, and expanded cluster size from 1,500 to over 5,000 nodes.
“Binghuoniao gave us a data‑driven foundation that linked the platform to cloud computing.” – Zeng Ming
Phase 4: Global Expansion (2015‑2017)
MaxCompute handled 300 PB+ daily data, supporting real‑time streams of up to 4.7 billion rows per second during Double‑11, while the platform opened to public cloud, ISVs, and overseas regions.
Phase 5: Innovation Era (2017‑present)
Launch of MaxCompute Lightning, DataWorks, DataHub, and unified streaming engines (Blink) delivered sub‑second interactive queries, cross‑region strong consistency, and automated SRE operations.
MaxCompute daily throughput >300 PB, zero baseline failures.
Lightning enabled graph‑based fraud detection.
Blink processed 4.72 × 10⁸ rows/s with 3 s latency.
Storage‑compute separation and hybrid‑cloud deployment supported 20 % of transaction traffic.
Through continuous breakthroughs in batch, streaming, and machine‑learning workloads, Alibaba has built an enterprise‑grade big‑data cloud that now serves global customers and drives the digital economy.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
