Tag

Delta Lake

0 views collected around this technical thread.

DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataDelta LakeHudi
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataDelta LakeHudi
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications
DataFunSummit
DataFunSummit
Jun 5, 2024 · Big Data

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Databricks announced the acquisition of Tabular, the company founded by the original creators of Apache Iceberg, aiming to integrate Delta Lake and Iceberg into a unified, open lakehouse architecture that enhances format compatibility, reduces data silos, and supports AI workloads.

Apache IcebergBig DataDelta Lake
0 likes · 5 min read
Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse
DataFunSummit
DataFunSummit
Apr 27, 2024 · Big Data

Delta Lake 3.1: New Features, Metadata Optimization, and Universal Format Overview

This article introduces Delta Lake 3.1, detailing its release background, the addition of Deletion Vector to Update and Merge commands, metadata‑driven count/min/max optimizations, the Universal Format for cross‑engine compatibility, and a comparative evaluation with Iceberg and Hudi.

Big DataDeletion VectorDelta Lake
0 likes · 8 min read
Delta Lake 3.1: New Features, Metadata Optimization, and Universal Format Overview
GuanYuan Data Tech Team
GuanYuan Data Tech Team
Jul 27, 2023 · Big Data

How Delta Lake Powers Scalable BI & AI: Real-World Practices and Optimizations

Guandata’s R&D leader outlines how their analytics platform leverages Delta Lake and Spark to deliver fast, ACID‑compliant BI and AI workloads, detailing architecture, key features like schema evolution and time travel, and practical performance tricks such as compaction, vacuuming, and multi‑engine integration.

AIBIBig Data
0 likes · 14 min read
How Delta Lake Powers Scalable BI & AI: Real-World Practices and Optimizations
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
Jun 29, 2023 · Big Data

Practical Deployment of Delta Lake in BI and AI Products

This article summarizes a technical presentation on how Delta Lake is integrated into a BI+AI platform, covering the product background, data‑lake architecture, Delta Lake features such as ACID transactions, schema management, multi‑engine support, performance optimizations, and future development directions.

AIBIBig Data
0 likes · 12 min read
Practical Deployment of Delta Lake in BI and AI Products
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution
DataFunSummit
DataFunSummit
Mar 10, 2023 · Big Data

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase’s data‑lake technology manager explores the distinction between data lakes and lakehouses, the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, their maturity across key capabilities, and the practical adoption challenges faced by enterprises.

Big DataDelta LakeHudi
0 likes · 14 min read
Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution
Big Data Technology Architecture
Big Data Technology Architecture
Feb 24, 2023 · Big Data

Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi

This article reviews lake‑format concepts, Apache Hudi architecture, CDC fundamentals, design considerations for CDC on lake formats, implementation details of Hudi CDC, and streaming optimizations including automated lake‑table management and a simplified StreamingSQL for Spark.

Apache HudiBig DataCDC
0 likes · 19 min read
Implementing Change Data Capture (CDC) on Data Lake Formats with Apache Hudi
DataFunSummit
DataFunSummit
Dec 29, 2022 · Big Data

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

This article explains the Lakehouse concept, why it is needed, the limitations of traditional data warehouses and data lakes, and how Databricks’ unified architecture—through open storage formats, fine‑grained governance, and optimized query engines—delivers high‑quality, low‑latency data for BI, analytics, and machine learning workloads.

Big DataData EngineeringDelta Lake
0 likes · 21 min read
Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks
DataFunSummit
DataFunSummit
Oct 11, 2022 · Big Data

Building Lakehouse Architecture with Delta Lake: Core Concepts, Technologies, Ecosystem, and Use Cases

This article explains how to construct a lakehouse architecture using Delta Lake by covering its basic concepts, version‑2 features, internal kernel and key technologies, ecosystem integrations, and classic data‑warehouse use cases such as G‑SCD and change‑data‑capture, providing practical guidance for modern big‑data engineering.

ACID TransactionsBig DataChange Data Capture
0 likes · 27 min read
Building Lakehouse Architecture with Delta Lake: Core Concepts, Technologies, Ecosystem, and Use Cases
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergDelta Lake
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
DataFunTalk
DataFunTalk
Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

BenchmarkDelta LakeHudi
0 likes · 24 min read
Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service
DataFunTalk
DataFunTalk
Aug 5, 2022 · Big Data

Delta Lake Principles, eBay Migration, and Practical Enhancements

This talk by eBay software engineer Zhu Feng explains the fundamentals of Delta Lake and Lakehouse architecture, outlines eBay’s migration from Teradata to a Spark‑based platform, and details the custom enhancements, performance optimizations, and operational improvements implemented to support large‑scale update and delete workloads.

Big DataData EngineeringDelta Lake
0 likes · 16 min read
Delta Lake Principles, eBay Migration, and Practical Enhancements
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

Big DataCDCDLF
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
NetEase Yanxuan Technology Product Team
NetEase Yanxuan Technology Product Team
Mar 30, 2022 · Big Data

Data Lake Construction and Practice at NetEase Yanxuan

NetEase Yanxuan replaced its cumbersome data‑warehouse with a flexible Delta‑Lake/Iceberg data lake, creating a unified metadata layer and real‑time ingestion pipelines that cut latency from nightly batches to seconds, slashed compute and storage costs, supported diverse business scenarios and machine‑learning feature engineering, and set the stage for broader future expansion.

Big DataDelta LakeFeature Engineering
0 likes · 16 min read
Data Lake Construction and Practice at NetEase Yanxuan
DataFunTalk
DataFunTalk
Jan 8, 2022 · Big Data

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

This article provides a comprehensive overview of the Lakehouse paradigm, tracing its origins from traditional data warehouses and data lakes, comparing architectures, detailing core components such as Delta Lake and Iceberg, and illustrating practical cloud implementations and future directions.

Apache IcebergBig DataCloud Data Platform
0 likes · 14 min read
Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices