Big Data 9 min read

Boosting Game Data Engineering with Alibaba Cloud EMR Serverless Spark

Yingjiao Network transformed its game data platform by adopting Alibaba Cloud EMR Serverless Spark, addressing previous architecture pain points, enhancing data collection, offline scheduling, and online analytics, which led to higher development speed, 50% faster compute, and improved stability for global game operations.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Boosting Game Data Engineering with Alibaba Cloud EMR Serverless Spark

Background

Yingjiao Network is a young innovative game company that develops challenging, artistic games. As its business expanded from a single hit game to multiple platforms and a global strategy, its data services required comprehensive optimization and upgrades.

Games like Arknights have high‑frequency activity cycles and diverse gameplay, leading to high data demand, noticeable tidal patterns, and the need for efficient development models and flexible resource provisioning. The data support goes beyond traditional BI reports, integrating deeply with gameplay and operations, demanding strong engine stability and extensions such as Thrift Server.

Why Choose Alibaba Cloud EMR Serverless Spark

The original architecture faced several pain points:

Missing external catalog support and integration with popular scheduling engines like DolphinScheduler.

Low community compatibility causing stability issues and lack of Remote Shuffle Service, leading to performance problems.

Weak technical support, insufficient in addressing user pain points and product iteration.

EMR Serverless Spark offers a cloud‑native, elastic, pluggable architecture that matches these needs. It is a high‑performance Lakehouse product compatible with open‑source Spark, providing end‑to‑end services for task development, debugging, publishing, scheduling, and operation.

Rich functionality : metadata management with Paimon Catalog and external Hive Metastore; seamless integration with Airflow, DolphinScheduler; three‑level resource model; ecosystem features like Spark Thrift Server and Notebook.

Excellent engine performance : built‑in Celeborn for shuffle, high‑speed Fusion SQL engine, 100% community compatibility, multi‑version support.

Comprehensive service guarantee : professional technical consulting, clear product roadmap.

Technical Design

Architecture diagram
Architecture diagram

Data Collection

We use a self‑developed tracing tool to collect log data and Flink CDC to sync database tables, ensuring real‑time and accurate data for downstream analysis.

Offline Scheduling

Two scheduling engines are provided: Airflow for code‑centric developers and DolphinScheduler for analysts and data‑warehouse engineers. Both integrate with EMR Serverless Spark, offering flexible platform services.

Serverless Spark reduces operational costs, improves stability, and its Celeborn capability solves disk bottlenecks in large shuffle tasks. Session state is strongly consistent with scheduling tools, eliminating double verification.

Online Computing

StarRocks is used for online queries; high‑quality metrics are visualized via an intelligent BI system and integrated into a business analysis platform, also supporting algorithm teams for data science.

Typical Scenarios

DolphinScheduler Job Development

Serverless Spark integrates a dedicated job type ALIYUN_SERVERLESS_SPARK supporting SQL, SQL file, and Jar jobs. Jobs are developed locally, deployed to OSS via CI, and executed on Serverless Spark.

DolphinScheduler job flow
DolphinScheduler job flow

Thrift Server for Ad‑Hoc Queries

Serverless Spark includes a Thrift Server allowing JDBC connections for SQL queries. It supports two main scenarios: ad‑hoc analysis by product operators using simple SQL jobs, and data‑warehouse development where query results are passed to downstream jobs.

Thrift Server architecture
Thrift Server architecture

Benefits After Migration

Improved development efficiency : Spark SQL sessions and DolphinScheduler scheduling accelerate feature delivery and support critical activity data.

Enhanced compute efficiency : Metric calculation time reduced from 30 minutes to 15 minutes, a 50 % speedup, shortening overall SLA chain by 1.5 hours.

Higher stability and lower ops pressure : Multi‑version management enables quick upgrades and a stable runtime experience.

Conclusion and Future Outlook

The practice proves that EMR Serverless Spark provides strong advantages for classic big‑data scenarios in the Spark ecosystem. Future expectations include further open‑source Lakehouse capabilities such as unified catalog management and broader coverage of edge and exploratory scenarios.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

data engineeringcloud computinggaming analytics
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.