Tag

Merge on Read

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
May 16, 2021 · Big Data

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

This article explains the evolution from traditional data warehouses to modern lakehouse architectures, introduces the Arctic system’s dynamic hash tree for fast update/delete, describes file splitting with sequence/offset ordering, and compares copy‑on‑write versus merge‑on‑read techniques for achieving low‑latency analytics.

ArcticBig DataDELETE
0 likes · 12 min read
Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2020 · Big Data

Understanding Hudi: Enabling Record‑Level Updates in Data Lakes

The article explains how Hudi enables efficient record‑level updates in data lakes by adapting database update concepts such as copy‑on‑write and merge‑on‑read, contrasting them with traditional RDBMS and NoSQL storage mechanisms and their trade‑offs.

Big DataHudiMerge on Read
0 likes · 11 min read
Understanding Hudi: Enabling Record‑Level Updates in Data Lakes
Big Data Technology Architecture
Big Data Technology Architecture
Jun 28, 2020 · Big Data

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

The article outlines the essential requirements for constructing petabyte‑scale data lakes—such as incremental CDC ingestion, log deduplication, storage management, ACID transactions, fast ETL, and compliance—and explains how Apache Hudi’s COW and Merge‑on‑Read architectures, async compaction, and advanced features address each need.

ACID TransactionsApache HudiAsync Compaction
0 likes · 13 min read
Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them