Inside Alibaba Cloud’s MRACC Engine: How It Won the TPCx‑BB Benchmark
Alibaba Cloud’s self‑developed MRACC (Apasara Compute MapReduce Accelerator) leveraged hardware‑software integration, Spark and Hadoop optimizations, and eRDMA networking to achieve the top TPCx‑BB SF3000 performance, delivering up to 2‑3× faster SQL queries and 30% faster Spark shuffle, with significant cost efficiency gains.
Background
Recently, the TPC Benchmark Express‑BigBench (TPCx‑BB) released its latest world ranking, and Alibaba Cloud’s self‑developed Shenlong Big Data Acceleration Engine secured the #1 position in the SF3000 category.
The TPCx‑BB test evaluates both performance and cost‑effectiveness. In the performance dimension, Alibaba Cloud outperformed the runner‑up by 41.6%, achieving 2187.42 BBQpm, and led the cost‑effectiveness metric by 40%, reducing it to 346.53 USD/BBQpm.
MRACC Overview
The Shenlong Big Data Acceleration Engine MRACC (Apasara Compute MapReduce Accelerator) is the key technology behind this achievement.
Facing the surge in data processing demands, many enterprises build their own clusters using open‑source Spark, Hadoop, or distributions like HDP and CDH, handling workloads from terabytes to petabytes across dozens to thousands of nodes. MRACC, built on the Shenlong platform, provides acceleration for common components such as Spark, Hadoop, and Alluxio in self‑built scenarios.
By tightly integrating hardware and software, MRACC delivers unique performance advantages: complex SQL queries run 2‑3× faster than community Spark, and eRDMA‑accelerated Spark gains an additional 30% speed boost. Customers running big‑data clusters on Alibaba Cloud ECS benefit from higher performance and better cost efficiency.
MRACC‑Spark Introduction
Since its debut in 2010, Spark has become the de‑facto engine for big‑data batch processing. MRACC focuses on optimizing Spark for heavy‑IO workloads by accelerating both network and storage layers. Techniques include SQL engine optimizations, caching, file pruning, indexing, offloading compression to heterogeneous devices, and using eRDMA for network acceleration during the shuffle phase, which reduces latency and improves CPU utilization.
Spark SQL Engine Optimizations
From Spark 2 onward, Spark SQL, DataFrames, and Datasets have become the primary programming model, with Spark 3.0 concentrating nearly half of its improvements on the SQL engine. MRACC enhances several stages: analyzer, optimizer, planner, and execution.
Key enhancements include dynamic sub‑query data pruning (beyond partition pruning), window‑top‑N sorting, Parquet row‑group pruning, bloom‑filter joins, a genetic algorithm for join‑order selection, push‑down deduplication, foreign‑key elimination, integrity‑constraint enforcement, and Delta Lake‑compatible DML support.
Near‑Network RDMA Optimization
At the 2021 Cloud Expo in Hangzhou, Alibaba Cloud unveiled the fourth‑generation Shenlong architecture, introducing industry‑first large‑scale elastic RDMA acceleration. RDMA provides direct memory access, bypassing the kernel to reduce CPU overhead and achieve low‑latency, high‑throughput networking. MRACC leverages eRDMA to transform Spark’s shuffle data exchange into a memory‑network‑memory pattern, yielding a 30% performance gain on end‑to‑end benchmarks such as TPCx‑HS.
Performance Results
On the TPC‑DS 10 TB dataset, MRACC delivers a 2.19× speedup over the latest Spark 3.1 release. In the TPCx‑BB benchmark, it outperforms the second‑place competitor by 41.6%.
Outlook
All optimizations are packaged as plugins, requiring minimal code changes for customers. Future work will continue to iterate on the hardware‑software co‑design, delivering higher performance and lower cost big‑data acceleration services to Alibaba Cloud users.
Appendix: TPCx‑BB Overview
TPCx‑BB, published by the Transaction Processing Performance Council (TPC), is an end‑to‑end big‑data benchmark based on retail scenarios. It supports major distributed big‑data engines, simulating online and offline business processes with 30 queries covering descriptive, data‑mining, and machine‑learning workloads. The benchmark’s large data volume, complex characteristics, and realistic workload make it a valuable reference for infrastructure selection.
The results reflect overall system performance, covering structured, semi‑structured, and unstructured data, and enable comprehensive evaluation of hardware, software, cost‑effectiveness, service quality, and power consumption from a customer’s perspective.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
