Tagged articles
7 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
ITPUB
ITPUB
Apr 26, 2022 · Big Data

Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation

This article explains the fundamentals of data lakes and data warehouses, compares their architectures, outlines the challenges of data lakes, and then dives deep into Delta Lake's core features, storage model, ACID guarantees, concurrency handling, and provides step‑by‑step Spark code examples for practical use.

ACIDCopy-on-WriteData Lake
0 likes · 18 min read
Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation
DataFunTalk
DataFunTalk
May 16, 2021 · Big Data

Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System

This article explains the evolution from traditional data warehouses to modern lakehouse architectures, introduces the Arctic system’s dynamic hash tree for fast update/delete, describes file splitting with sequence/offset ordering, and compares copy‑on‑write versus merge‑on‑read techniques for achieving low‑latency analytics.

ArcticBig DataCopy-on-Write
0 likes · 12 min read
Efficient Data Update/Delete and Real‑time Processing in the Arctic Lakehouse System
Big Data Technology Architecture
Big Data Technology Architecture
Jun 28, 2020 · Big Data

Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them

The article outlines the essential requirements for constructing petabyte‑scale data lakes—such as incremental CDC ingestion, log deduplication, storage management, ACID transactions, fast ETL, and compliance—and explains how Apache Hudi’s COW and Merge‑on‑Read architectures, async compaction, and advanced features address each need.

ACID TransactionsApache HudiAsync Compaction
0 likes · 13 min read
Key Requirements for Building PB‑Scale Data Lakes and How Apache Hudi Meets Them