How Alibaba’s ODPS Platform Redefines Integrated Big Data Computing
This article outlines Alibaba Cloud’s ODPS evolution, its multi‑engine architecture, recent MaxCompute and Hologres enhancements, and how these innovations deliver scalable, fast, and simple big‑data solutions for diverse cloud‑native scenarios.
Speaker: Liu Yiming, Head of Alibaba Cloud’s self‑developed big‑data product team.
Topic: New capabilities of Alibaba Cloud ODPS integrated big‑data intelligent computing platform.
Event: 2022 Cloud Expo – Integrated Big Data Intelligent Summit.
Alibaba has been building big‑data solutions for 13 years, driven by a vision to free computing power from hardware limits and leverage cloud elasticity for massive data processing. This led to the Design for Scale, Speed, Simplicity, and Scenario principles, emphasizing simple, usable technology that creates business value.
Key historical trends in big data include:
MPP technologies forming the foundation of many modern data engines, such as Alibaba’s real‑time data warehouse Hologres.
Open‑source big‑data frameworks evolving with distributed systems, with Alibaba contributing heavily to Flink.
Integration of big data with cloud‑native architectures, lowering entry barriers and spawning services like BigQuery and Snowflake; Alibaba followed this path by incubating a serverless big‑data engine, ODPS.
ODPS Reboot: Integrated Architecture Meets Diverse Computing Needs
Originally Open Data Processing Service, ODPS now stands for Open Data Platform and Service, reflecting its evolution toward a unified platform that supports multiple engines and scenarios, including lake‑warehouse integration and batch‑real‑time convergence.
To address growing speed and interactivity demands, Alibaba introduced a suite of specialized engines: MaxCompute for massive batch jobs, Flink for streaming, and Hologres for interactive analytics, all orchestrated through the DataWorks development and governance platform.
ODPS’s achievements include processing exabytes of data daily, six consecutive TPCx‑BB performance championships, and numerous patents.
Integrated Architecture Layers
Storage layer: Multi‑engine shared storage base Pangu, enabling compute‑storage separation.
Scheduling layer: Unified container scheduling for elastic, hybrid deployment and reduced operational costs.
Multi‑engine layer: Cross‑engine direct reads, supporting MaxCompute and Hologres on the same data with federated queries.
Metadata layer: Unified metadata management providing a single asset view.
Development layer: DataWorks‑based unified data‑warehouse development.
Enterprise capabilities: Fine‑grained security management and cross‑engine authorization.
New MaxCompute Features
At the summit, Alibaba announced direct Hologres data reads, delivering multi‑fold query acceleration without consuming Hologres resources. Elastic CU capabilities now allow dynamic scaling based on workload, reducing costs while maintaining performance.
Additional enhancements include broader External Schema support (MySQL, PostgreSQL, etc.), fine‑grained permissions for unstructured files via a Volume abstraction, configurable high‑throughput streaming writes, schema evolution, upcoming ACID 2.0 upserts, and intelligent materialized view recommendations that automatically rewrite queries for optimal performance.
Hologres Engine Updates
Hologres now offers columnar storage for JSON, enabling schemaless data handling with high compression, fast filtering, and efficient indexing, achieving tens‑of‑times query speedups. Its shared‑storage, multi‑replica architecture provides global consistency, read/write separation, and robust OLAP/online query isolation.
In recent TPC‑H 30 TB benchmarks, Hologres set a new world record, surpassing the previous best by 23%, demonstrating how deep performance optimization can reduce hardware needs while handling larger workloads.
These innovations illustrate Alibaba’s commitment to a cloud‑native, integrated big‑data platform that balances scale, speed, and simplicity for modern data‑driven enterprises.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
