How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms
This article summarizes Alibaba Cloud senior product expert He Yuan's presentation on EMR 2.0, outlining the challenges of open‑source big data, the evolution of EMR, and the new features—including cloud‑native architecture, enhanced performance, diverse resource models, and expanded analysis scenarios—aimed at reducing cost and complexity.
Abstract
This article compiles the sharing of Alibaba Cloud senior product expert He Yuan (Jinghang) at the Alibaba Cloud EMR 2.0 online release. The content is divided into three parts: 1) Pain points of open‑source big data and EMR product journey; 2) New features of EMR 2.0; 3) Summary.
1. Pain Points of Open‑Source Big Data
Improving performance while reducing resource cost.
Lowering operation and maintenance expenses as component count and scale grow.
Ensuring data and task reliability across hundreds of machines.
Managing data development and governance with proper methodology and product support.
2. EMR Product Journey
Since its launch in 2016, Alibaba Cloud EMR has continuously addressed these pain points. Through performance optimizations, EMR achieved world‑first results on CloudSort and TPC‑DS, introduced fully managed metadata and data‑lake products, and simplified data development and governance via DataWorks on EMR and EMR Studio.
3. New Features of EMR 2.0
3.1 Overview
Built on cloud‑native principles and Alibaba Cloud’s mature infrastructure, EMR 2.0 offers a next‑generation open‑source big data foundation.
3.2 New Platform Experience
Elasticity : Cluster creation speed >2×, scaling >3×, support for thousands of nodes, fault‑node migration.
Stability : Automatic fault‑node compensation, component health inspection, event notifications.
Intelligence : Cluster resource diagnostics, risk alerts, real‑time detection.
Efficiency : Interactive data development, one‑click task submission, configuration export & cluster cloning.
3.3 New Data Development
EMR 2.0 provides two solutions:
EMR Studio (Notebook based on Jupyter, Workflow based on DolphinScheduler) – a fully managed SaaS notebook and workflow platform.
DataWorks on EMR – an enterprise‑grade data development and governance platform supporting data integration, development, quality, lineage, security, analysis, service, and open APIs.
3.4 New Resource Forms
EMR on ECS : Supports Intel, AMD, and Yitian CPUs; >40% cost‑performance improvement.
EMR on ACK (Kubernetes) : Full K8s compatibility, 10‑second scheduling, supports Spark, Flink, Presto, RSS.
EMR Serverless : Fully managed, pay‑as‑you‑go, high availability (99.99% SLA), integrates with EMR Notebook.
3.5 New Analysis Scenarios
Data Lake : Spark, Hive, Yarn, Presto, Hudi, DeltaLake, RSS, Kyuubi, etc.
Real‑time Data Stream : Flink, Kafka.
Data Analysis : StarRocks, Doris, ClickHouse.
Data Service : HBase, Phoenix.
Data Science : TensorFlow, PyTorch for ML, data mining, feature engineering.
EMR also supports custom clusters that mix components for multi‑scenario workloads.
4. Summary
EMR 2.0 brings comprehensive innovations from control plane to engine, from resource models to application scenarios, aiming to better solve the pain points of open‑source big data for users.
Visit the upgraded console at https://emr-next.console.aliyun.com/ for the new EMR experience.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Big Data AI Platform
The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
