Tagged articles

EMR Serverless

9 articles · Page 1 of 1

May 23, 2026 · Cloud Computing

Best Practice: Using EMR Serverless StarRocks AI Function for Financial Text Classification

This article demonstrates how to leverage StarRocks AI Function on EMR Serverless to perform sentiment analysis, intelligent classification, information extraction, and PII redaction on financial text entirely within SQL, eliminating data export, reducing latency, and ensuring compliance while providing concrete code examples, performance benchmarks, and best‑practice recommendations.

AI FunctionEMR ServerlessFinancial NLP

0 likes · 25 min read

Best Practice: Using EMR Serverless StarRocks AI Function for Financial Text Classification

Amazon Cloud Developers

Feb 13, 2026 · Big Data

How EMR Serverless Storage Cuts Costs up to 55% for Shuffle‑Heavy Spark Jobs

A performance comparison of Amazon EMR Serverless Storage on a 3 TB TPC‑DS benchmark shows up to 55 % cost reduction and 25 % faster runtimes for shuffle‑intensive Spark jobs, while outlining usage limits and providing Python tools to analyze shuffle data from Spark event logs.

Cost SavingsEMR ServerlessShuffle Storage

0 likes · 13 min read

How EMR Serverless Storage Cuts Costs up to 55% for Shuffle‑Heavy Spark Jobs

Alibaba Cloud Big Data AI Platform

Nov 15, 2025 · Big Data

From a Decade-Long Big Data Journey to a Cloud‑Native Lakehouse

This article chronicles a ten‑year evolution of a self‑built big data platform—detailing early Hadoop clusters, successive migrations to Spark, Hive, Hudi, and StarRocks, the operational challenges encountered, and the comprehensive shift to Alibaba Cloud EMR Serverless that delivered significant cost, performance, and stability gains while outlining future intelligent‑ecosystem plans.

Big DataData LakeEMR Serverless

0 likes · 17 min read

From a Decade-Long Big Data Journey to a Cloud‑Native Lakehouse

Alibaba Cloud Big Data AI Platform

Jun 10, 2025 · Big Data

Boosting Automotive Data Processing with Alibaba Cloud EMR Serverless Spark

This article details how a leading automotive parts supply‑chain platform migrated from a traditional Hadoop stack to Alibaba Cloud EMR Serverless Spark and DataWorks, achieving faster, more elastic, and cost‑effective data processing, enhanced AI integration, and significant operational improvements across multiple business scenarios.

Big DataCloud NativeData Lake

0 likes · 12 min read

Boosting Automotive Data Processing with Alibaba Cloud EMR Serverless Spark

Alibaba Cloud Big Data AI Platform

Mar 20, 2025 · Big Data

How to Read and Write StarRocks Data with EMR Serverless Spark

This step‑by‑step guide explains how to use EMR Serverless Spark together with the StarRocks Spark Connector to create a workspace, upload the connector JAR, configure network connections, create databases and tables in StarRocks, and perform read/write operations via SQL sessions, Notebook sessions, or batch Spark jobs, complete with code examples and UI screenshots.

Big DataData IntegrationEMR Serverless

0 likes · 14 min read

How to Read and Write StarRocks Data with EMR Serverless Spark

DataFunSummit

Feb 1, 2025 · Big Data

Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices

This article explains the challenges of big‑data processing in the cloud era, introduces Spark’s native‑language SQL engine rewrites, discusses vectorization and code generation techniques, describes cloud‑native storage‑compute separation with Remote Shuffle services such as Apache Celeborn, and presents the production benefits of Alibaba Cloud’s EMR Serverless Spark.

Big DataCodegenEMR Serverless

0 likes · 12 min read

Spark Native and Cloud Native: Vectorized SQL Engines, Remote Shuffle, and EMR Serverless Spark Practices

Alibaba Cloud Big Data AI Platform

Jul 19, 2024 · Big Data

How to Deploy a PySpark Streaming Job on EMR Serverless Spark

This guide walks you through creating a Kafka‑enabled EMR Serverless Spark cluster, configuring network connections and security groups, uploading JARs and Python resources, and finally launching and monitoring a PySpark streaming application.

Big DataEMR ServerlessPySpark

0 likes · 8 min read

How to Deploy a PySpark Streaming Job on EMR Serverless Spark

Alibaba Cloud Big Data AI Platform

Jun 25, 2024 · Big Data

Build Real-Time Data Lake Analytics with Flink, Paimon, and EMR Serverless Spark

This guide demonstrates how to use Alibaba Cloud's EMR Serverless Spark and Flink Serverless services together with Apache Paimon to ingest streaming data, perform interactive queries, and schedule offline compaction jobs, creating a unified real‑time and batch data lake solution.

Big DataData LakeEMR Serverless

0 likes · 6 min read

Build Real-Time Data Lake Analytics with Flink, Paimon, and EMR Serverless Spark

Alibaba Cloud Big Data AI Platform

Jul 6, 2023 · Big Data

Explore World Cup Analytics on EMR Serverless StarRocks – Free Trial Guide

This guide walks you through creating a fully managed EMR Serverless StarRocks instance, loading historical World Cup data, and running OLAP SQL queries to analyze championship counts and host‑nation performance, all using a free trial of compute and storage resources.

Big DataEMR ServerlessOLAP

0 likes · 11 min read

Explore World Cup Analytics on EMR Serverless StarRocks – Free Trial Guide