Big Data 6 min read

How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark

The Auron project, a native vectorized execution engine donated by Kuaishou and now incubated by the Apache Software Foundation, leverages Rust and SIMD to cut resource overhead, achieve over‑two‑fold speedups on TPC‑DS benchmarks, and integrates seamlessly with Spark and other big‑data ecosystems.

DataFunTalk
DataFunTalk
DataFunTalk
How Auron’s Vectorized Engine Doubles Big Data Performance Over Spark

Auron Introduction

Recently, the vectorized engine Auron (formerly Blaze), open‑sourced and donated by Kuaishou, entered the Apache Software Foundation’s incubation program, marking a milestone that enables the project to benefit from ASF’s mature open‑source governance and gain sustainable innovation momentum.

Auron is a native execution engine built on vectorization technology, fully exploiting native code and SIMD instructions to reduce resource consumption and accelerate execution.

Core capabilities include:

Native execution implemented in Rust, eliminating JVM overhead for better performance.

Vectorized computation built on Apache Arrow columnar format, leveraging SIMD for batch processing.

Pluggable architecture that integrates seamlessly with Apache Spark and can be extended to other engines.

Production‑grade optimizations such as multi‑level memory management, optimized shuffle format, and adaptive execution strategies.

Based on Auron, TPC‑DS benchmarks show more than a 2× performance improvement compared with Spark.

Auron Development History and Current Status

In January 2022, Kuaishou’s big‑data Spark engine team launched the Blaze project, open‑sourcing all code on GitHub. After more than a year of iteration, Blaze achieved significant performance gains on TPC‑H/TPC‑DS benchmarks by September 2023. The team performed extensive production optimizations, and the engine now runs tens of thousands of daily tasks handling exabyte‑scale data, saving millions of dollars in server costs.

Since January 2024, community management has been active: over ten releases, more than 1.5K GitHub stars, and contributions from 30+ developers worldwide. Companies such as Didi, Ctrip, Autohome, 58.com, and OPPO have adopted Blaze and praised its performance, stability, and ease of use.

In August 2025, the project entered the ASF incubator and was renamed Auron (pronounced [ˈɔːrɑːn]), inspired by “Aura”. The name reflects the powerful performance a big‑data engine can deliver, and future roadmap includes support for Flink, data‑lake systems, and GPU/DPU integration.

Joining the Apache Incubator

Joining the Apache Incubator reflects our commitment to open‑source sustainability. We will adhere to Apache’s governance model, ensure full transparency of code, documentation, and community processes, and invite more developers to contribute. By aligning with other Apache big‑data projects such as Spark, Flink, and Celeborn, Auron aims to complement and advance the ecosystem.

Acknowledgements

Thanks to all Auron community contributors, upstream project contributors, and especially champion Calvin Kirs and mentors Xuanwo, Becket Qin, and Nicholas Jiang for their support during the incubation process.

Get Involved

We invite developers and users interested in Auron to join the open‑source community.

GitHub repository: https://github.com/apache/auron/

Official website: https://auron.apache.org/

Mailing list: [email protected]

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

RustSIMDApache IncubatorVectorized EngineAuron
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.