Big Data 8 min read

How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing

At the 2024 Cloud Xi Conference, Alibaba Cloud unveiled a suite of vectorized big‑data solutions—including the Flash engine for Flink, EMR Serverless Spark with a 300% speed boost, upgraded lakehouse architecture, and real‑world case studies—showcasing massive performance gains, cost reductions, and broader serverless adoption.

Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
How Alibaba Cloud’s New Vectorized Engines Are Revolutionizing Real‑Time Big Data Processing

2024 Cloud Xi Conference Highlights

The conference featured talks by Alibaba Cloud researchers and experts, including Wang Feng, Li Yu, Fan Zhen, Li Jinsong, and Jiang Qian.

Flash: The First Vectorized Stream Engine for Flink

Alibaba Cloud announced Flash, a vectorized Flink stream engine that is 5‑10× faster than open‑source Flink while maintaining 100% compatibility. The engine is now open for trial via support tickets.

Wang Feng emphasized that Flash will be promoted in the public‑cloud market to help small‑ and medium‑size enterprises adopt Flink without code changes, reducing costs and improving efficiency.

In internal production, Flash has been used by over 10 business units and more than 100,000 CU, delivering an average 52% cost reduction.

EMR Serverless Spark

EMR Serverless Spark, a cloud‑native, fully managed serverless product, launched commercially. It features a self‑developed vectorized Fusion engine that delivers up to 300% performance improvement over open‑source Spark, interactive notebooks, embedded SQL editor, version control, workflow scheduling, and monitoring.

The product supports elastic scaling and pay‑as‑you‑go pricing, integrating with the DLF lakehouse platform.

EMR Serverless StarRocks 2.0

Marking one year since commercial launch, StarRocks Serverless has served over 500 customers across 20+ industries. The 2.0 release introduces a compute‑storage separation architecture with StarOS upgrades, multi‑warehouse support, elastic scaling, and table optimizations.

EMR Platform Upgrades

EMR on ACS now integrates seamlessly with ACS, adding resource queue, quota management, job monitoring, and diagnostics, plus support for multiple compute engines. EMR on ECS gains automated elastic scaling and intelligent diagnostic capabilities.

Lakehouse Architecture & Apache Paimon

The upgraded lakehouse architecture leverages Apache Paimon as a high‑performance, highly scalable storage layer for real‑time streaming, lake‑on‑OLAP acceleration, and unstructured data processing.

Since its 2022 inception in the Flink community, Paimon has been adopted by many companies, enabling more real‑time, open, and cost‑effective lakehouse solutions. It is also a core component of Alibaba Cloud OpenLake, which unifies big data, search, and AI workloads.

Seven Cat Free Novel Data Warehouse Practice

Compute‑storage separation architecture upgrade for greater flexibility and scalability.

Metadata and data lineage construction for robust data tracking and management.

Data governance practices establishing standardized processes.

Upcoming Event: Flink Forward Asia 2024

The Flink Forward Asia 2024 conference will be held in Shanghai on November 29‑30, offering a platform to learn about the latest Flink developments, share production experiences, and network with industry leaders. Early‑bird registration provides discounts and exclusive merchandise.

Register at https://asia.flink-forward.org/shanghai-2024/ .

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

ServerlessBig DataFlinkdata lakevectorized computing
Alibaba Cloud Big Data AI Platform
Written by

Alibaba Cloud Big Data AI Platform

The Alibaba Cloud Big Data AI Platform builds on Alibaba’s leading cloud infrastructure, big‑data and AI engineering capabilities, scenario algorithms, and extensive industry experience to offer enterprises and developers a one‑stop, cloud‑native big‑data and AI capability suite. It boosts AI development efficiency, enables large‑scale AI deployment across industries, and drives business value.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.