Tagged articles

Merge-on-Read

8 articles · Page 1 of 1

Oct 16, 2025 · Big Data

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

This article explains Apache Hudi's architecture, including its timeline mechanism, file layout, indexing strategies, table types (COW and MOR), query options, storage format versioning, backward compatibility, and key configuration settings for managing data lake tables.

Apache HudiBig DataCopy-on-Write

0 likes · 8 min read

Understanding Apache Hudi Core Concepts: Timeline, File Layout, and Table Types

Past Memory Big Data

Jul 30, 2025 · Big Data

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

The article explains how Apache Iceberg v3 replaces the scalable‑limited positional‑delete mechanism in Merge‑on‑Read tables with compact Deletion Vectors, detailing the performance, I/O and metadata drawbacks of positional deletes and showing how the new bitmap‑based approach resolves them.

Apache IcebergData LakeDeletion Vector

0 likes · 20 min read

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

DataFunTalk

Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data

0 likes · 18 min read

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

ITPUB

Apr 26, 2022 · Big Data

Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation

This article explains the fundamentals of data lakes and data warehouses, compares their architectures, outlines the challenges of data lakes, and then dives deep into Delta Lake's core features, storage model, ACID guarantees, concurrency handling, and provides step‑by‑step Spark code examples for practical use.

ACIDCopy-on-WriteData Lake

0 likes · 18 min read

Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation

Big Data Technology & Architecture

Feb 8, 2022 · Big Data

Apache Hudi Overview: Design Principles, Table Architecture, and Read/Write Processes

This article provides a comprehensive overview of Apache Hudi, covering its storage reliance on HDFS, core design principles, table architecture, timeline management, file and index structures, as well as detailed read and write workflows for both Copy‑On‑Write and Merge‑On‑Read table types.

Apache HudiBig DataCopy-on-Write

0 likes · 16 min read

Apache Hudi Overview: Design Principles, Table Architecture, and Read/Write Processes

DataFunTalk

May 16, 2021 · Big Data

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

This article explains the evolution from traditional data warehouses to modern lakehouse architectures, introduces the Arctic system’s dynamic hash tree for fast update/delete, describes file splitting with sequence/offset ordering, and compares copy‑on‑write versus merge‑on‑read techniques for achieving low‑latency analytics.

ArcticBig DataCopy-on-Write

0 likes · 12 min read

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

Big Data Technology Architecture

Nov 23, 2020 · Big Data

Understanding Hudi: Enabling Record‑Level Updates in Data Lakes

The article explains how Hudi enables efficient record‑level updates in data lakes by adapting database update concepts such as copy‑on‑write and merge‑on‑read, contrasting them with traditional RDBMS and NoSQL storage mechanisms and their trade‑offs.

Copy-on-WriteHudiMerge-on-Read

0 likes · 11 min read

Understanding Hudi: Enabling Record‑Level Updates in Data Lakes

Big Data Technology Architecture

Jun 28, 2020 · Big Data

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

The article outlines the essential requirements for constructing petabyte‑scale data lakes—such as incremental CDC ingestion, log deduplication, storage management, ACID transactions, fast ETL, and compliance—and explains how Apache Hudi’s COW and Merge‑on‑Read architectures, async compaction, and advanced features address each need.

ACID TransactionsApache HudiAsync Compaction

0 likes · 13 min read

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them