Big Data 9 min read

Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained

This article explains Apache Hudi’s core concepts, including its timeline architecture, file layout, indexing mechanisms, and the two primary table types—Copy on Write and Merge on Read—along with their trade‑offs and the various query modes such as snapshot, time‑travel, and incremental queries.

JD Tech Talk
JD Tech Talk
JD Tech Talk
Understanding Hudi Core Concepts: Timeline, Indexes, and Table Types Explained

Hudi Architecture Overview

Apache Hudi organizes data using a timeline that records every write operation as an Instant . Each instant progresses through states such as requested , inflight , and completed . The timeline drives actions like commits, cleanups, and compactions.

Timeline Components

Instant actions: commit, clean, compaction, delta‑commit, replace, rollback, savepoint.

State types: requested, inflight, completed.

Official examples illustrate how instants appear on the timeline.

File Layout

Data is stored in a hierarchical layout where each file group contains a base file and, for Merge‑on‑Read (MOR) tables, one or more log files. This layout enables efficient reads, incremental processing, and controlled file‑size growth.

Indexing Mechanisms

Hudi provides several index types to accelerate record location and reduce scan volume:

Bloom filter index

Record index

Column‑range index

Secondary index

Table Types

Copy on Write (COW)

COW tables are optimized for read‑heavy workloads. When a record is updated or deleted, Hudi creates a new base file for the affected file group; no log files are generated. Queries therefore read only base files, delivering high read performance. Writes can be slower because entire file groups may be rewritten even for small changes.

Copy on Write (COW) tables create a new base file for each update or delete, avoiding log files. This ensures queries read only base files, providing high read performance, while writes may be slower due to rewriting entire file groups.
COW timeline illustration
COW timeline illustration

Advantages of COW

Automatic updates on existing files without rewriting entire partitions.

Ability to read only modified data, avoiding unnecessary scans.

Strict file‑size control to maintain query performance.

Merge on Read (MOR)

MOR tables balance write latency and read performance by writing updates and deletes to lightweight log files (e.g., Avro or columnar formats) and periodically compacting them with base files. At query time, log records are merged with the base files, providing near‑real‑time data availability. Query speed depends on whether recent log files have been compacted.

Merge on Read (MOR) tables write updates to log files and merge them with base files during query execution, lowering write latency and supporting near‑real‑time data availability. Query speed depends on whether log files have been compacted.
MOR query illustration
MOR query illustration

Writes can be committed as frequently as every minute.

Two query modes: Read Optimized (fast, may miss the latest log entries) and Snapshot (full view including recent logs).

Read Optimized queries see only committed base files; Snapshot queries return the combined view of base and log files.

Query Types

Snapshot Queries : Return the latest committed view of the table, using indexes when available.

Time Travel Queries : Access the table state at a specific past instant, useful for reproducible experiments.

Read Optimized Queries (MOR only) : Use only base columnar files for fast reads, tolerating slight staleness.

Incremental Queries (Latest State) : Return rows changed since a given instant, providing the latest value per record key.

Incremental Queries (CDC) : Emit change‑data‑capture streams with before/after images for inserts, updates, and deletes.

Query type overview
Query type overview
big dataindexingdata lakeApache HudiTable Typesquery modes
JD Tech Talk
Written by

JD Tech Talk

Official JD Tech public account delivering best practices and technology innovation.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.