Big Data 19 min read

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

This article explains the motivation, design principles, implementation details, and performance improvements of the new multi‑modal indexing subsystem introduced in Apache Hudi 0.11.0 for Lakehouse architectures, covering scalable metadata, ACID updates, fast lookups, file listing, data skipping, upsert performance, and future work.

Big Data Technology Architecture

Jun 7, 2022

Multi-Modal Index in Apache Hudi 0.11.0: Design, Implementation, and Performance Benefits

1. Why Use Multi-Modal Index in Hudi

Indexes are widely used in database systems to reduce I/O and improve query efficiency. Hudi has supported indexing capabilities from the start to accelerate upserts in Lakehouse workloads. Existing index techniques from RDBMSs are not fully exploited for query performance on data lakes, whose scale can be 10‑100× larger than traditional warehouses. In Hudi 0.11.0 we re‑imagined a universal multi‑modal index for data lakes, built on an enhanced metadata table and an asynchronous indexing mechanism.

2. Design and Implementation

Multi‑modal indexes must satisfy three requirements:

Scalable metadata: table metadata must scale to terabytes and allow easy integration of new index types.

ACID transaction updates: indexes and metadata must stay consistent with the data table, without exposing partial writes.

Fast lookups: point‑lookup operations must be efficient without scanning the entire index, even when the index size reaches TBs.

Based on these requirements we designed and implemented a universal indexing subsystem for Hudi.

2.1 Scalable Metadata

All indexes that contain table metadata are stored in a Hudi Merge‑On‑Read (MOR) table, i.e., the metadata table. This serverless table is independent of compute and query engines. MOR layout avoids data‑merge synchronization and reduces write amplification, enabling metadata to scale to TB size similar to systems like BigQuery. Existing file, column_stats, and bloom_filter indexes are built on this foundation, and the framework can be extended to new index types such as bitmap or R‑tree indexes. Hudi also provides asynchronous indexing, the first of its kind, allowing index construction alongside regular writes without affecting write latency.

2.2 ACID Transaction Updates

The metadata table guarantees ACID transaction updates. All changes to the data table are translated into metadata records committed to the metadata table in a multi‑table transaction, ensuring that a write succeeds only when both tables commit. This guarantees atomicity and resilience to failures, preventing partial writes from being visible. The metadata table is self‑managed, requiring no external table services for compaction or cleaning. Future work includes a log‑compaction service to further reduce write amplification.

2.3 Fast Lookups

To improve read/write performance, the processing layer needs point lookups to find necessary entries in the metadata table. Columnar Parquet and row‑based Avro are not suitable for point lookups, whereas HBase's HFile format is designed for efficient point queries.

Experiments measuring point‑lookup latency for N entries among 10 million entries show HFile achieving 10‑100× lower latency than Parquet or Avro. Because most accesses to the metadata table are point or range lookups, HFile was chosen as the underlying file format for the metadata table.

The metadata table indexes are served via a centralized timeline‑server cache, further reducing executor lookup latency.

3. How Multi‑Modal Index Improves Performance?

The metadata table provides several benefits: faster file listing, data skipping, and upsert performance.

3.1 File Listing

Large analytical pipelines often contain thousands of partitions and hundreds of thousands of files, making direct file listing a bottleneck due to throttling and high I/O. Hudi stores file information in a "files" partition of the metadata table, avoiding filesystem calls such as exists, listStatus, and listFiles.

Benchmarks on Amazon S3 show that using the file index reduces listing latency by 2‑10× compared to direct S3 listing, especially for tables with millions of files. Reusing the metadata‑table reader and caching the index on the timeline server further lowers latency.

3.2 Data Skipping

The column_stats partition stores statistics (min, max, null count, size, etc.) for each column of every data file. Query predicates can use these statistics to skip non‑matching files, dramatically reducing I/O and query latency.

In the column_stats partition, the record key is composed of column name, partition name, and data file name, enabling point and prefix lookups. This design reduces the number of index entries to O(num_query_columns) instead of O(num_table_columns), yielding significant speedups for wide tables.

Experiments on a 10 M‑entry file with prefix lookups show HFile achieving at least 3× lower latency than sub‑optimal formats, delivering 10‑30× query latency gains.

3.3 Upsert Performance

The bloom_filter partition stores Bloom filters for all data files, avoiding the need to read file footers. Point and prefix lookups on this partition make reading Bloom filters up to 3× faster than scanning individual file footers.

3.4 Future Work

Planned enhancements include a record‑level index that maps record keys to their actual data files, auxiliary column indexes, bitmap indexes, and more Bloom filters, aiming to support ultra‑large datasets with billions of records and tighter SLAs.

4. Conclusion

Hudi introduces a novel, serverless, high‑performance multi‑modal index for Lakehouse architectures, providing scalable, self‑managed auxiliary data storage to improve read/write performance and enabling easy addition of richer indexes in future releases.

Big Data Technology Architecture

Exploring Open Source Big Data and AI Technologies

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.