How AliORC Supercharges MaxCompute: Inside the Next‑Gen Columnar Format
This article explains how Alibaba's MaxCompute platform evolved its storage engine from row‑based CFile to the columnar AliORC format, details the technical innovations such as async prefetch, small I/O elimination, adaptive dictionary encoding, and range‑aligned reads, and compares its performance against Apache ORC and Parquet.
Introduction
MaxCompute is Alibaba's EB‑level compute platform that has become the core of the group's data middle‑platform and the foundation of Alibaba Cloud big‑data services. Its storage engine is built on the open‑source Apache ORC file format and has been extended to a new columnar format called AliORC.
Background of MaxCompute Storage Engine
MaxCompute provides a secure, high‑performance, low‑cost online big‑data service that scales from GB to EB. It stores 99% of Alibaba Group's data. The storage engine sits between MaxCompute tasks and the underlying Pangu distributed file system, offering a unified logical data model.
The core of the storage layer is the file format. Row‑based storage writes rows sequentially, while columnar storage writes each column together, allowing selective column reads, higher compression, and reduced I/O.
MaxCompute’s format evolution: CFile1 (row‑based) → CFile2 (first columnar) → AliORC (next‑gen columnar).
What Is Apache ORC?
Apache ORC is a fast, compact columnar storage format for the Hadoop ecosystem, supporting ACID transactions, lightweight indexes, and complex types. It is used by Spark, Presto, Hive, Impala, Arrow, and others.
Alibaba’s Contributions to the ORC Community
Alibaba’s MaxCompute team has contributed over 30 patches (≈15 k lines of code) to the Apache ORC project, including a full C++ ORC writer, bug fixes, and performance optimizations. The ORC founder highlighted Alibaba’s contributions at the 2017 Hadoop Summit.
Open‑Source ORC File Format Overview
ORC models types as a tree: structs have child nodes, maps have key/value nodes, lists have a single child, and primitive types are leaf nodes.
Why Choose ORC?
When selecting a next‑generation file format, the open‑source community was favored. Apache Parquet, co‑developed by Cloudera and Twitter, shares many design goals with ORC, such as columnar storage and compression, but Parquet offers better support for nested types and a broader set of encodings.
ORC vs. Parquet Performance
Performance tests at the Hadoop Summit using GitHub logs and NYC taxi data showed that ORC and Parquet have similar storage efficiency, while ORC generally provides faster read performance under comparable compression.
AliORC Technical Deep Dive
Async Prefetch
Traditional reads perform I/O then decompress sequentially, causing latency. AliORC issues all read requests asynchronously, allowing I/O and decompression to overlap, reducing read time from 14 s to 3 s in benchmarks.
Eliminating Small I/O
AliORC sorts compressed column blocks by size during write, grouping small columns together. Readers can then fetch a large I/O block to retrieve all small columns, reducing sub‑64 KB I/O operations to zero.
Memory Management
Instead of a fixed 1 MB buffer per column, AliORC starts with 64 KB blocks and allocates additional 64 KB blocks on demand, avoiding costly O(N) copies and reducing peak memory usage, especially for dynamic partition writes.
Seek Optimization
AliORC aligns compression block boundaries with Row Group boundaries, ensuring each Row Group resides in its own block. This avoids unnecessary decompression when seeking to a specific Row Group, cutting seek‑related I/O and CPU cost by about fivefold.
Adaptive Dictionary Encoding
AliORC decides early whether a column should use dictionary encoding, avoiding costly fallback rewrites. It also replaces std::unordered_map with Google's dense_hash_map, gaining ~10% write performance, and removes unnecessary dictionary sorting for a further 3% boost.
Range‑Aligned Reads for Range Partitions
For range‑partitioned tables, AliORC stores each bucket in a separate AliORC file with a B+Tree‑like index. During joins, range alignment allows workers to read only the intersecting buckets, eliminating shuffle and dramatically reducing data transfer.
AliORC Performance Results
Internal Alibaba tests show AliORC reads up to twice as fast as the open‑source C++ and Java ORC implementations. Successive internal versions have delivered ~30% performance improvements each, and AliORC is now deployed at large scale in MaxCompute production.
Personal Growth Reflections
The author, a senior technical expert, shares his journey from video‑codec work at Uber to becoming an ORC PMC, emphasizing the value of open‑source contributions and the vibrant technical community within Alibaba’s MaxCompute teams.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
