Big Data 11 min read

How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms

This article summarizes Alibaba Cloud senior product expert He Yuan's presentation on EMR 2.0, outlining the challenges of open‑source big data, the evolution of EMR, and the new features—including cloud‑native architecture, enhanced performance, diverse resource models, and expanded analysis scenarios—aimed at reducing cost and complexity.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms

Abstract

This article compiles the sharing of Alibaba Cloud senior product expert He Yuan (Jinghang) at the Alibaba Cloud EMR 2.0 online release. The content is divided into three parts: 1) Pain points of open‑source big data and EMR product journey; 2) New features of EMR 2.0; 3) Summary.

1. Pain Points of Open‑Source Big Data

Improving performance while reducing resource cost.

Lowering operation and maintenance expenses as component count and scale grow.

Ensuring data and task reliability across hundreds of machines.

Managing data development and governance with proper methodology and product support.

2. EMR Product Journey

Since its launch in 2016, Alibaba Cloud EMR has continuously addressed these pain points. Through performance optimizations, EMR achieved world‑first results on CloudSort and TPC‑DS, introduced fully managed metadata and data‑lake products, and simplified data development and governance via DataWorks on EMR and EMR Studio.

3. New Features of EMR 2.0

3.1 Overview

Built on cloud‑native principles and Alibaba Cloud’s mature infrastructure, EMR 2.0 offers a next‑generation open‑source big data foundation.

3.2 New Platform Experience

Elasticity : Cluster creation speed >2×, scaling >3×, support for thousands of nodes, fault‑node migration.

Stability : Automatic fault‑node compensation, component health inspection, event notifications.

Intelligence : Cluster resource diagnostics, risk alerts, real‑time detection.

Efficiency : Interactive data development, one‑click task submission, configuration export & cluster cloning.

3.3 New Data Development

EMR 2.0 provides two solutions:

EMR Studio (Notebook based on Jupyter, Workflow based on DolphinScheduler) – a fully managed SaaS notebook and workflow platform.

DataWorks on EMR – an enterprise‑grade data development and governance platform supporting data integration, development, quality, lineage, security, analysis, service, and open APIs.

3.4 New Resource Forms

EMR on ECS : Supports Intel, AMD, and Yitian CPUs; >40% cost‑performance improvement.

EMR on ACK (Kubernetes) : Full K8s compatibility, 10‑second scheduling, supports Spark, Flink, Presto, RSS.

EMR Serverless : Fully managed, pay‑as‑you‑go, high availability (99.99% SLA), integrates with EMR Notebook.

3.5 New Analysis Scenarios

Data Lake : Spark, Hive, Yarn, Presto, Hudi, DeltaLake, RSS, Kyuubi, etc.

Real‑time Data Stream : Flink, Kafka.

Data Analysis : StarRocks, Doris, ClickHouse.

Data Service : HBase, Phoenix.

Data Science : TensorFlow, PyTorch for ML, data mining, feature engineering.

EMR also supports custom clusters that mix components for multi‑scenario workloads.

4. Summary

EMR 2.0 brings comprehensive innovations from control plane to engine, from resource models to application scenarios, aiming to better solve the pain points of open‑source big data for users.

Visit the upgraded console at https://emr-next.console.aliyun.com/ for the new EMR experience.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativeServerlessBig DataData LakeAlibaba CloudEMR
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.