Tagged articles
29 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Oct 18, 2025 · Big Data

Alibaba Cloud EMR’s AI Evolution: Accelerating Big Data Performance

Since its 2016 launch, Alibaba Cloud EMR has transformed from a basic open‑source Hadoop service into a high‑performance, AI‑enabled big‑data platform, delivering optimized I/O, vectorized processing, and integrated AI functions such as natural‑language SQL, StarRocks and Spark enhancements, while supporting diverse industry workloads.

EMRSparkStarRocks
0 likes · 9 min read
Alibaba Cloud EMR’s AI Evolution: Accelerating Big Data Performance
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Dec 8, 2023 · Cloud Computing

How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics

This article summarizes Li Yu's presentation on Alibaba Cloud EMR's deep collaboration with the StarRocks community, detailing major contributions across versions, the serverless StarRocks product’s core capabilities, and future plans to enhance OLAP‑lakehouse integration, performance, and cloud‑native elasticity.

Alibaba CloudEMRLakehouse
0 likes · 7 min read
How Alibaba Cloud EMR Powers Serverless StarRocks for Seamless Lakehouse Analytics
DataFunSummit
DataFunSummit
Jul 20, 2023 · Big Data

Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases

This article introduces Volcano EMR's cloud‑native OLAP solution, detailing its product overview, storage‑compute separation, elastic scaling, cost and hot‑cold data management, intelligent query analysis, multiple customer case studies, and future roadmap for real‑time and offline data warehousing.

Cost ManagementData WarehouseEMR
0 likes · 11 min read
Cloud‑Native OLAP on Volcano EMR: Architecture, Capabilities, and Customer Cases
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 3, 2023 · Big Data

How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance

This article outlines Alibaba Cloud EMR's three‑stage evolution—compatibility, contribution, and beyond open source—detailing its early Hadoop adoption, Flink and Spark innovations, cloud‑native optimizations, and enterprise‑grade features such as Remote Shuffle Service, performance benchmarks, and integrated diagnostics.

Alibaba CloudBig DataCloud Native
0 likes · 13 min read
How Alibaba Cloud EMR Evolved from Open‑Source Compatibility to Enterprise‑Grade Performance
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Feb 8, 2023 · Big Data

How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms

This article summarizes Alibaba Cloud senior product expert He Yuan's presentation on EMR 2.0, outlining the challenges of open‑source big data, the evolution of EMR, and the new features—including cloud‑native architecture, enhanced performance, diverse resource models, and expanded analysis scenarios—aimed at reducing cost and complexity.

Alibaba CloudBig DataCloud Native
0 likes · 11 min read
How Alibaba Cloud EMR 2.0 Redefines Open‑Source Big Data Platforms
ByteDance Data Platform
ByteDance Data Platform
Jan 11, 2023 · Big Data

How EMR Stateless Transforms Big Data with Transient, Stateless Clusters

This article explains the concept of transient clusters and the Stateless architecture in Volcano Engine's EMR platform, compares Stateless with traditional Stateful approaches, outlines its evolution, core components, elastic scaling features, and the business value of cost‑effective, on‑demand big‑data processing.

Cloud NativeEMRTransient Cluster
0 likes · 17 min read
How EMR Stateless Transforms Big Data with Transient, Stateless Clusters
Alibaba Cloud Native
Alibaba Cloud Native
Jan 9, 2023 · Big Data

How Kubernetes Powers Cloud‑Native Big Data with EMR on ACK

This article explains the shift of big data and machine‑learning workloads toward storage‑compute separation and cloud‑native architectures, outlines the technical challenges of running Spark on Kubernetes, and details the EMR on ACK solution with its architecture, performance gains, and real‑world adoption.

ACKEMRSpark
0 likes · 6 min read
How Kubernetes Powers Cloud‑Native Big Data with EMR on ACK
StarRocks
StarRocks
Dec 1, 2022 · Big Data

How Alibaba Cloud EMR StarRocks Supercharges Data Lake Analytics with Advanced Optimizations

This article explains how Alibaba Cloud EMR StarRocks extends data lake analytics to support Hive, Iceberg, and Hudi, detailing its architecture, Iceberg integration, performance gains over Trino, IO merging, lazy materialization, intelligent caching, and elastic compute capabilities for faster, unified, and cost‑effective queries.

Data LakeEMRElastic Compute
0 likes · 16 min read
How Alibaba Cloud EMR StarRocks Supercharges Data Lake Analytics with Advanced Optimizations
StarRocks
StarRocks
Nov 4, 2022 · Big Data

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

This article explains how to design and implement a cloud‑native Lakehouse using StarRocks and Tencent Cloud EMR, covering core technical requirements, a five‑layer architecture, data ingestion with Iceberg/Hudi, performance tricks like Z‑order clustering, cost‑control through elastic scaling, and the key product features of EMR StarRocks.

Big DataEMRHudi
0 likes · 24 min read
Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR
Volcano Engine Developer Services
Volcano Engine Developer Services
Sep 21, 2022 · Big Data

Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions

This article explores the open‑source lakehouse trend, outlines the architectural features of Volcano Engine EMR, examines key challenges of building enterprise‑grade data lakehouses, and presents best‑practice case studies demonstrating how EMR enables scalable, real‑time analytics, storage‑compute separation, and seamless integration with modern big‑data engines.

Data LakehouseEMRStorage Compute Separation
0 likes · 22 min read
Unlocking Enterprise Data Lakehouse: Trends, Challenges, and Volcano Engine EMR Solutions
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse
0 likes · 11 min read
Unlocking Delta Lake: Key Features, Architecture, and EMR Integration
ByteDance Data Platform
ByteDance Data Platform
Feb 25, 2022 · Big Data

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

ByteDance’s EMR team details how they integrated data‑lake engines such as Hudi and Iceberg into SparkSQL, streamlined jar management, built a custom Spark SQL Server with Hive compatibility, multi‑tenant support, engine pre‑warming, and transaction capabilities, dramatically improving performance and resource efficiency for enterprise workloads.

EMRHudiIceberg
0 likes · 11 min read
Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server
Tencent Tech
Tencent Tech
Sep 10, 2021 · Big Data

How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime

This article details how Sohu Changyou’s data team, together with Tencent Cloud engineers, planned and executed a seamless migration of over one petabyte of game data to Elastic MapReduce, Elasticsearch Service and Oceanus, achieving zero service impact and dramatically improving analytics performance.

Big DataEMRGame Analytics
0 likes · 9 min read
How Sohu Changyou Migrated 1 PB of Game Data to the Cloud Without Downtime
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 30, 2021 · Big Data

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how Soul's data engineering team replaced nightly batch ETL with real-time Delta Lake ingestion on EMR, detailing the motivations, comparative analysis of Delta, Hudi, Iceberg, the implementation architecture, encountered issues such as data skew and schema evolution, and the solutions adopted to improve performance and reliability.

Data LakeData SkewDelta Lake
0 likes · 13 min read
Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions
Tencent Cloud Developer
Tencent Cloud Developer
Dec 30, 2020 · Big Data

How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads

This article analyzes the challenges of traditional monolithic big‑data architectures, explains how Tencent Cloud EMR integrates Alluxio for compute‑storage separation, presents detailed performance benchmarks showing 20‑50% bandwidth reduction and 5‑40% query speedup, and outlines the specific tuning measures applied.

AlluxioBig DataCompute-Storage Separation
0 likes · 10 min read
How Alluxio Boosts Tencent Cloud EMR: Cutting Bandwidth by 50% and Accelerating IO‑Intensive Workloads
Big Data Technology Architecture
Big Data Technology Architecture
Nov 21, 2020 · Big Data

Multi-Engine Support and Future Directions of Alibaba Cloud Data Lake Building Service

The article explains how Alibaba Cloud's Data Lake Building Service enables fine‑grained lake management by integrating multiple compute engines—including EMR, MaxCompute, Blink, Hologres, PAI, and open‑source Hive, Spark, and Presto—through unified metadata and OSS storage, while outlining current features, special format support, and planned future enhancements.

Alibaba CloudEMROSS
0 likes · 9 min read
Multi-Engine Support and Future Directions of Alibaba Cloud Data Lake Building Service
Tencent Cloud Developer
Tencent Cloud Developer
Oct 19, 2020 · Big Data

Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR

By parallelizing Spark’s driver‑side commit, trash, and move phases—previously single‑threaded operations that caused costly copy‑on‑rename when writing massive files to object storage—the Tencent Cloud EMR case achieved over a tenfold (1,100 %) speedup, making object storage a viable alternative to HDFS.

Big DataEMRPerformance Optimization
0 likes · 8 min read
Improving Spark Write Performance for Massive Files on Object Storage with Tencent Cloud EMR
Alibaba Cloud Developer
Alibaba Cloud Developer
Sep 27, 2020 · Big Data

Why Spark on Kubernetes Needs a Remote Shuffle Service—and How It Boosts Performance

This article examines the challenges of running Spark on Kubernetes, introduces the Remote Shuffle Service architecture to overcome shuffle bottlenecks, details EMR on ACK integration, showcases performance gains with Terasort benchmarks, and outlines future cloud‑native big‑data strategies such as mixed‑cluster and serverless deployments.

EMRRemote Shuffle ServiceSpark
0 likes · 13 min read
Why Spark on Kubernetes Needs a Remote Shuffle Service—and How It Boosts Performance
Tencent Cloud Developer
Tencent Cloud Developer
May 21, 2019 · Information Security

Design and Implementation of a Cloud Audit Solution for Tencent Cloud Accounts

The article details a scalable, extensible cloud‑audit architecture for Tencent Cloud accounts that stores API logs in a Shanghai‑region COS bucket, processes them with EMR‑based Hive tables and hourly partition scripts, aggregates results into a hot MySQL store, and enables administrators to monitor all sub‑accounts with a real‑time “god view.”

COSEMRHive
0 likes · 13 min read
Design and Implementation of a Cloud Audit Solution for Tencent Cloud Accounts