Big Data 9 min read

How EMR Serverless Spark Cut Batch Processing Time by Over 50% for a 600M‑User Platform

This case study details how Qimao leveraged Alibaba Cloud EMR Serverless Spark with Fusion and Celeborn to overcome multi‑business‑line data‑processing challenges, achieving more than 50% faster batch jobs, significant cost reductions, and improved operational flexibility across its 600 million‑user ecosystem.

Alibaba Cloud Big Data AI Platform

Dec 5, 2025

How EMR Serverless Spark Cut Batch Processing Time by Over 50% for a 600M‑User Platform

Background

Qimao is an internet company focused on cultural entertainment with over 600 million users. Its data‑warehouse team is responsible for offline and real‑time data development, metric construction, and data governance across multiple business lines.

Pain Points of the Legacy Architecture

Complex analytical requirements from data‑warehouse engineers, analysts, and algorithm engineers.

High compute cost: the original cluster only supported open‑source Spark, lacking native acceleration and Remote Shuffle Service, and both the main and adhoc clusters had no elasticity, leading to significant resource waste.

Operations complexity: manual interventions at the resource layer, Spark engine upgrades, Python environment management, and inability to accurately evaluate per‑job cost.

Why Alibaba Cloud EMR Serverless Spark

EMR Serverless Spark integrates Fusion (vectorized SQL acceleration) and Celeborn (enterprise‑grade shuffle service), delivering more than 50 % performance improvement for batch workloads.

Benchmark results (resource configuration shown as CPU cores / memory GB):

User‑behavior incremental analysis – 500C/1500G: Yarn 32 min → Serverless Spark 10 min (69 % faster).

User‑log detail processing – 500C/1200G: Yarn 30 min → Serverless Spark 14 min (53 % faster).

Content aggregation and statistics – 800C/1200G: Yarn 71 min → Serverless Spark 38 min (46 % faster). The job processed 11 TB of shuffle data with stable performance.

Technical Solution Design

Application Layer

Data development is orchestrated with Alibaba Cloud DataWorks and a self‑built Apache DolphinScheduler.

Reporting and ad‑hoc analysis are performed in JetBrains IDEs together with a self‑built Apache Superset.

Access Layer

Jobs are submitted via the EMR Serverless Spark spark-submit tool, which is 100 % compatible with the open‑source client.

Daily analysis and instant queries use the Kyuubi Gateway, providing RESTful and JDBC interfaces that are also fully compatible with open‑source standards.

Control Plane

Multi‑AZ high availability with transparent failover.

Resource‑queue isolation isolates different teams and business lines.

Job‑level management enables fine‑grained accounting down to 1 core (1 CU) and per‑job cost tracking.

Post‑Migration Benefits

Technical

Core job runtime reduced by ~30 minutes; daily report generation time advanced by ~5 hours.

60 days of continuous warehouse operation without SLA breach.

Seconds‑level scaling with a 1 CU step size, achieving near‑100 % resource utilization.

Job‑level isolation of Spark versions and Python environments eliminates cross‑job interference and reduces production risk.

Financial

Offline warehouse cost decreased by 35 %.

Ad‑hoc query cost decreased by 30 %.

Business

Data acquisition efficiency improved, cutting roughly 40 % of idle waiting time for business teams.

Data error rate dropped by 90 %, preventing costly decision errors.

Future Outlook

Build an acceleration layer to automate collaboration between StarRocks and Serverless Spark.

Deepen lake‑warehouse integration with Paimon + Serverless Spark + StarRocks for end‑to‑end optimization.

Leverage ongoing EMR product upgrades (Unified Catalog, AI Function) to further empower data intelligence.

Performance optimization cloud computing Data Warehouse Serverless Spark

Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.