How Alibaba Built a World‑Class Big Data Platform Over a Decade
Over ten years, Alibaba’s data engineers transformed a modest Hadoop‑based system into a globally‑scalable, high‑performance big data platform—ODPS/MaxCompute—supporting massive offline and real‑time workloads, pioneering innovations like the 5K cluster expansion, Blink streaming, and the unified ‘Moon’ migration.
After the tenth Double 11, Alibaba launched the "Ten Years of Code" series to let core engineers review the evolution of its big‑data computing platform that powers the annual shopping festival.
Each Double 11 serves as a massive stress test for data teams, showcasing the platform’s ability to handle unprecedented transaction volumes and real‑time analytics.
From 2009’s modest 0.5 billion‑yuan sales to 2135 billion‑yuan in 2022, the platform has grown into the world’s most powerful concurrent big‑data processing system, continuously accumulating technical breakthroughs.
The platform supports over 95% of the group’s storage and compute services, handling massive traffic, high concurrency, and end‑to‑end data pipelines.
Phase 1: Birth of Big‑Data Applications
Before 2009, Alibaba relied on IOE systems with Oracle clusters. As user growth exploded, the existing Greenplum clusters hit scaling limits, prompting the launch of Alibaba Cloud in September 2009 and the development of the proprietary "Feitian" stack: the distributed storage system Pangu, the scheduler Fuxi, and other components. Early on, Hadoop was used as a stop‑gap for large‑scale batch processing.
From 2009‑2010, two systems (YunTi 1 and YunTi 2) co‑existed, with YunTi 1 handling early Taobao workloads and YunTi 2 supporting Alibaba Finance’s first product.
Phase 2: First‑Generation ODPS Engine
In 2010, YunTi 1 grew to 1,000 nodes and the first ODPS engine entered production for finance loan services. Early jobs took over 30 hours, requiring nightly on‑call rotations.
Phase 3: New‑Generation Data Platform (YunTi 2)
From 2012‑2015, three key projects shaped the platform: Ice‑Fire‑Bird (data‑business unification), the 5K challenge (breaking storage bottlenecks), and the "Moon" migration (consolidating YunTi 1 into YunTi 2).
Ice‑Fire‑Bird
“Ice‑Fire‑Bird gave us a unified data platform that connects to the cloud‑computing layer, enabling the whole group to build data‑driven services.” – Zeng Ming
The project introduced the DataX synchronization tool, TT data‑bus, and unified scheduling (DQC, data map, lineage).
5K Challenge (2013)
Expanded cluster from 1,500 to 5,000 nodes in 48 days.
Handled 400 K concurrent jobs under extreme pressure tests.
Achieved 100 TB sorting in 30 minutes, beating the previous world record.
Completed cross‑cluster migration and achieved 99.95% recovery after a full‑room power‑off drill.
Moon Migration (2014‑2015)
The goal was to merge YunTi 1 into YunTi 2, creating a unified data resource pool and retiring the legacy Hadoop clusters. By June 2015 the migration was complete, resulting in a platform with over 10 clusters, nearly 100 000 servers, and exabyte‑scale storage.
Phase 4: Global Expansion of Alibaba’s Big‑Data Computing
From 2015 onward, the platform served public‑cloud users, ISVs, and enterprises, launching mixed‑cloud offerings and the first public‑cloud stream‑processing service.
MaxCompute (formerly ODPS) 2.0 broke the 10 k‑node barrier.
DataWorks became a unified web‑based development, scheduling, and governance platform.
StreamCompute V1.0 powered the 2017 Double 11 real‑time dashboard, processing 4.7 × 10⁸ events per second with 3 s latency.
Phase 5: New Era of Big‑Data Innovation
Since 2017, Alibaba introduced MaxCompute Lightning (interactive‑query + real‑time), upgraded the architecture with NewSQL, AliORC, and storage‑compute separation, and built DataHub—a PB‑level data bus with strong consistency across regions.
Key 2017 achievements:
MaxCompute processed >300 PB daily, delivering baseline jobs an hour early.
Lightning enabled sub‑second interactive queries for thousands of merchants.
Blink streamed 4.72 × 10⁸ logs per second with 3 s end‑to‑end latency.
DataHub migration provided cross‑region strong‑consistency and minute‑level failover.
In 2018, the SRE team automated OS upgrades, built a fully automated disaster‑recovery platform (Tesla), and launched intelligent monitoring to achieve near‑zero‑human‑intervention operations.
Conclusion
Through continuous innovation in batch, stream, and machine‑learning workloads, Alibaba has built an enterprise‑grade, globally‑scalable big‑data platform that now powers not only its own ecosystem but also external customers via Alibaba Cloud, driving the digital transformation of businesses and cities.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
