Tagged articles
7 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Nov 15, 2025 · Big Data

From a Decade-Long Big Data Journey to a Cloud‑Native Lakehouse

This article chronicles a ten‑year evolution of a self‑built big data platform—detailing early Hadoop clusters, successive migrations to Spark, Hive, Hudi, and StarRocks, the operational challenges encountered, and the comprehensive shift to Alibaba Cloud EMR Serverless that delivered significant cost, performance, and stability gains while outlining future intelligent‑ecosystem plans.

Big DataData LakeSpark
0 likes · 17 min read
From a Decade-Long Big Data Journey to a Cloud‑Native Lakehouse
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Jun 10, 2025 · Big Data

Boosting Automotive Data Processing with Alibaba Cloud EMR Serverless Spark

This article details how a leading automotive parts supply‑chain platform migrated from a traditional Hadoop stack to Alibaba Cloud EMR Serverless Spark and DataWorks, achieving faster, more elastic, and cost‑effective data processing, enhanced AI integration, and significant operational improvements across multiple business scenarios.

Big DataCloud NativeData Lake
0 likes · 12 min read
Boosting Automotive Data Processing with Alibaba Cloud EMR Serverless Spark
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Mar 20, 2025 · Big Data

How to Read and Write StarRocks Data with EMR Serverless Spark

This step‑by‑step guide explains how to use EMR Serverless Spark together with the StarRocks Spark Connector to create a workspace, upload the connector JAR, configure network connections, create databases and tables in StarRocks, and perform read/write operations via SQL sessions, Notebook sessions, or batch Spark jobs, complete with code examples and UI screenshots.

Big DataData IntegrationSpark
0 likes · 14 min read
How to Read and Write StarRocks Data with EMR Serverless Spark
DataFunSummit
DataFunSummit
Feb 1, 2025 · Big Data

Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices

This article explains the challenges of big‑data processing in the cloud era, introduces Spark’s native‑language SQL engine rewrites, discusses vectorization and code generation techniques, describes cloud‑native storage‑compute separation with Remote Shuffle services such as Apache Celeborn, and presents the production benefits of Alibaba Cloud’s EMR Serverless Spark.

Big DataCodegenRemote Shuffle
0 likes · 12 min read
Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices