Tagged articles
54 articles
Page 1 of 1
DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications
dbaplus Community
dbaplus Community
Nov 8, 2023 · Big Data

Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

This article compares traditional data warehouses, modern data lakes, and emerging lakehouse architectures, explaining their design patterns, advantages, disadvantages, and suitable use cases, while detailing implementation considerations such as schema design, ETL/ELT processes, file formats like Delta, Iceberg, and Hudi, and factors influencing platform selection.

Apache SparkData LakeDelta Lake
0 likes · 20 min read
Choosing Between Data Warehouse, Data Lake, and Lakehouse: When to Use Each

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake
0 likes · 14 min read
How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
Jun 29, 2023 · Big Data

Practical Deployment of Delta Lake in BI and AI Products

This article summarizes a technical presentation on how Delta Lake is integrated into a BI+AI platform, covering the product background, data‑lake architecture, Delta Lake features such as ACID transactions, schema management, multi‑engine support, performance optimizations, and future development directions.

BIBig DataData Lake
0 likes · 12 min read
Practical Deployment of Delta Lake in BI and AI Products
DataFunTalk
DataFunTalk
Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake
0 likes · 13 min read
Interview on Data Lakehouse: Current Applications, Challenges, and Evolution
DataFunSummit
DataFunSummit
Dec 29, 2022 · Big Data

Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks

This article explains the Lakehouse concept, why it is needed, the limitations of traditional data warehouses and data lakes, and how Databricks’ unified architecture—through open storage formats, fine‑grained governance, and optimized query engines—delivers high‑quality, low‑latency data for BI, analytics, and machine learning workloads.

DatabricksDelta LakeLakehouse
0 likes · 21 min read
Understanding Lakehouse Systems: Architecture, Practices, and Innovations by Databricks
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
DataFunTalk
DataFunTalk
Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

Data LakeDelta LakeHudi
0 likes · 24 min read
Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service
DataFunTalk
DataFunTalk
Aug 5, 2022 · Big Data

Delta Lake Principles, eBay Migration, and Practical Enhancements

This talk by eBay software engineer Zhu Feng explains the fundamentals of Delta Lake and Lakehouse architecture, outlines eBay’s migration from Teradata to a Spark‑based platform, and details the custom enhancements, performance optimizations, and operational improvements implemented to support large‑scale update and delete workloads.

Data LakeDelta LakeLakehouse
0 likes · 16 min read
Delta Lake Principles, eBay Migration, and Practical Enhancements
Big Data Technology Architecture
Big Data Technology Architecture
May 22, 2022 · Big Data

Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions

This article introduces Delta Lake as an open‑source storage layer for lake‑house architectures, explains its key features, file and metadata structures, and details how Alibaba Cloud EMR and Data Lake Formation integrate and extend Delta Lake with advanced capabilities such as G‑SCD, CDC, performance optimizations, and future roadmap.

CDCDLFDelta Lake
0 likes · 10 min read
Delta Lake Overview, File Structure, Metadata, and Its Integration with Alibaba Cloud EMR, DLF, G‑SCD and CDC Solutions
Alibaba Cloud Developer
Alibaba Cloud Developer
May 18, 2022 · Big Data

Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees

This article explains how Delta Lake adds reliability to data lakes by offering ACID transactions, scalable metadata, and unified batch‑and‑stream processing, outlines the challenges it solves, details its implementation principles, and demonstrates a practical demo for building an integrated data warehouse.

ACIDBig DataData Lake
0 likes · 9 min read
Why Delta Lake Is Revolutionizing Data Lakes with ACID Guarantees
Alibaba Cloud Developer
Alibaba Cloud Developer
May 13, 2022 · Big Data

Unlocking Delta Lake: Key Features, Architecture, and EMR Integration

Delta Lake, an open‑source storage layer from Databricks, provides ACID transactions, data versioning, schema evolution, and unified batch‑stream processing, with a detailed file structure and metadata mechanism, while Alibaba Cloud EMR enhances it with advanced DML, performance optimizations, deep DLF integration, and solutions for G‑SCD and CDC.

CDCDLFData Lakehouse
0 likes · 11 min read
Unlocking Delta Lake: Key Features, Architecture, and EMR Integration
ITPUB
ITPUB
Apr 26, 2022 · Big Data

Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation

This article explains the fundamentals of data lakes and data warehouses, compares their architectures, outlines the challenges of data lakes, and then dives deep into Delta Lake's core features, storage model, ACID guarantees, concurrency handling, and provides step‑by‑step Spark code examples for practical use.

ACIDCopy-on-WriteData Lake
0 likes · 18 min read
Mastering Delta Lake: From Data Lake Basics to Hands‑On Implementation
Zuoyebang Tech Team
Zuoyebang Tech Team
Apr 13, 2022 · Big Data

How Delta Lake Transformed Our Offline Data Warehouse Performance

This article details how ZuoYeBang's engineering team migrated their Hive‑based offline data warehouse to Delta Lake, tackling latency, scalability, and query‑performance challenges through stream‑to‑batch processing, data‑lake architecture, and optimizations like DPP and Z‑ordering.

Big DataDelta LakePresto
0 likes · 15 min read
How Delta Lake Transformed Our Offline Data Warehouse Performance

Data Lake Construction and Practice at NetEase Yanxuan

NetEase Yanxuan replaced its cumbersome data‑warehouse with a flexible Delta‑Lake/Iceberg data lake, creating a unified metadata layer and real‑time ingestion pipelines that cut latency from nightly batches to seconds, slashed compute and storage costs, supported diverse business scenarios and machine‑learning feature engineering, and set the stage for broader future expansion.

Data IntegrationData LakeDelta Lake
0 likes · 16 min read
Data Lake Construction and Practice at NetEase Yanxuan
Yanxuan Tech Team
Yanxuan Tech Team
Mar 29, 2022 · Big Data

How NetEase Yanxuan Built a Real‑Time Data Lake to Boost Efficiency

This article explains how NetEase Yanxuan evolved from a traditional data‑warehouse pipeline to a cloud‑native data‑lake architecture, detailing the business challenges, design choices, technology stack (Delta, Iceberg, Hudi), implementation steps, and the resulting gains in real‑time data access, cost reduction, and feature‑engineering support.

Data LakeDelta LakeHudi
0 likes · 18 min read
How NetEase Yanxuan Built a Real‑Time Data Lake to Boost Efficiency
DataFunTalk
DataFunTalk
Jan 8, 2022 · Big Data

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

This article provides a comprehensive overview of the Lakehouse paradigm, tracing its origins from traditional data warehouses and data lakes, comparing architectures, detailing core components such as Delta Lake and Iceberg, and illustrating practical cloud implementations and future directions.

Apache IcebergBig DataCloud Data Platform
0 likes · 14 min read
Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices
Big Data Technology Architecture
Big Data Technology Architecture
Nov 13, 2021 · Big Data

Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake

This article details Baicaowei's migration from an IDC‑hosted Hadoop cluster to a cloud‑native data lake on Alibaba Cloud, outlining the business drivers, pain points of the legacy platform, architectural goals, design principles, solution selection, implementation steps, and future outlook for the new big‑data ecosystem.

Alibaba CloudBig DataDelta Lake
0 likes · 16 min read
Case Study: Migrating Baicaowei's On‑Premise Hadoop Data Platform to Alibaba Cloud Native Data Lake
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink
0 likes · 19 min read
Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake
dbaplus Community
dbaplus Community
Aug 17, 2021 · Big Data

How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics

This article examines JD's shift from a traditional Lambda‑based data warehouse to a Delta Lake‑powered real‑time data lake, detailing the challenges of legacy architectures, the evaluation of open‑source table formats, Delta Lake's core mechanisms, and the resulting simplified batch‑stream development workflow.

Batch-StreamBig DataData Lake
0 likes · 11 min read
How JD Transformed Its Data Warehouse with Delta Lake for Real‑Time Analytics
Big Data Technology Architecture
Big Data Technology Architecture
Aug 12, 2021 · Big Data

Enterprise Data Lake Architecture, Delta Lake Core Capabilities, and Stream‑Batch Integrated Analytics on Alibaba Cloud

This article explains the rapid growth of data, the limitations of traditional warehouses, and how a cloud‑based data lake built on object storage with Delta Lake format provides low‑cost, flexible, and ACID‑compliant analytics, followed by a step‑by‑step guide to ingest, manage, and analyze data using Alibaba Cloud DLF and Databricks DDI with Spark streaming and batch jobs.

Alibaba CloudDelta LakeSpark
0 likes · 19 min read
Enterprise Data Lake Architecture, Delta Lake Core Capabilities, and Stream‑Batch Integrated Analytics on Alibaba Cloud
DataFunTalk
DataFunTalk
Apr 18, 2021 · Big Data

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

This article compares Apache Hudi, Apache Iceberg, and Delta Lake, examining their storage formats, platform compatibility, update performance, concurrency guarantees, and integration with lakeFS to help readers choose the most suitable solution for their data lake use case.

Apache HudiApache IcebergDelta Lake
0 likes · 16 min read
Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 30, 2021 · Big Data

Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how Soul's data engineering team replaced nightly batch ETL with real-time Delta Lake ingestion on EMR, detailing the motivations, comparative analysis of Delta, Hudi, Iceberg, the implementation architecture, encountered issues such as data skew and schema evolution, and the solutions adopted to improve performance and reliability.

Data LakeData SkewDelta Lake
0 likes · 13 min read
Implementing Real-Time Data Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions
Big Data Technology & Architecture
Big Data Technology & Architecture
Mar 23, 2021 · Big Data

Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake

This article presents a comprehensive overview of data lake implementations, detailing Huawei's production‑scene platform, a real‑time financial data lake architecture using Kafka, Flink and Iceberg, and Soul's Delta Lake practice with Spark, Hive, and custom ETL tools, highlighting design choices, processing flows, and operational considerations.

Data LakeDelta LakeFlink
0 likes · 8 min read
Practical Implementations of Data Lakes: Huawei Production Scenario, Real-Time Financial Data Lake, and Soul's Delta Lake
Big Data Technology Architecture
Big Data Technology Architecture
Mar 2, 2021 · Big Data

Implementing Real-Time Log Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions

This article describes how a data engineering team replaced nightly batch ETL with a Delta Lake‑based real‑time log ingestion pipeline on EMR, detailing the motivations, architecture, implementation steps, encountered issues such as data skew and schema evolution, and the practical solutions they applied to achieve low‑latency, reliable data delivery.

Delta LakeSparkhive
0 likes · 14 min read
Implementing Real-Time Log Ingestion with Delta Lake on EMR: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Dec 15, 2020 · Big Data

Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights

This article introduces JD's real‑time data warehouse evolution, outlines the limitations of traditional Lambda‑based warehouses, compares open‑source lake formats (Delta, Hudi, Iceberg), explains Delta Lake's transaction‑log architecture and read flow, and demonstrates how a unified batch‑stream development model simplifies data processing and improves reliability.

ACIDData LakeDelta Lake
0 likes · 12 min read
Exploring JD's Real‑Time Data Lake with Delta Lake: Architecture, Challenges, and Practical Insights
Big Data Technology Architecture
Big Data Technology Architecture
Nov 23, 2020 · Big Data

One‑Stop Data Lake Ingestion Solution with Alibaba Cloud Data Lake Formation (DLF)

The article describes Alibaba Cloud's Data Lake Formation service, presenting a unified, real‑time, and low‑latency solution for ingesting heterogeneous data sources—including RDS, DTS, TableStore, and SLS—into an OSS‑backed data lake using templates, a Spark‑based ingestion engine, and modern file formats such as Delta Lake.

Alibaba CloudDelta LakeReal-time Processing
0 likes · 10 min read
One‑Stop Data Lake Ingestion Solution with Alibaba Cloud Data Lake Formation (DLF)
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 14, 2020 · Big Data

Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions

This article examines the core requirements of data lakes and provides an in‑depth comparison of three major open‑source solutions—Apache Hudi, Apache CarbonData, and Delta Lake—highlighting their architectures, ACID support, query capabilities, and suitability for various real‑time and batch use cases.

ACIDApache CarbonDataApache Hudi
0 likes · 9 min read
Comparative Analysis of Apache Hudi, Apache CarbonData, and Delta Lake for Data Lake Solutions
Big Data Technology Architecture
Big Data Technology Architecture
May 10, 2020 · Big Data

The Flourishing Big Data Ecosystem and the Rise of Delta Lake

The article reviews the evolution of the big‑data ecosystem from 2017 to 2019, highlights Spark’s dominance, examines storage‑layer challenges of traditional Hive‑based warehouses, and explains how Delta Lake’s metadata‑driven library simplifies architecture, adds ACID features, and competes with Hudi and Iceberg.

Delta LakeSpark
0 likes · 8 min read
The Flourishing Big Data Ecosystem and the Rise of Delta Lake
Big Data Technology Architecture
Big Data Technology Architecture
Mar 24, 2020 · Big Data

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

This article examines the three leading open‑source data‑lake projects—Delta Lake, Apache Iceberg, and Apache Hudi—by outlining their origins, core problems they address, key features, and a detailed seven‑dimension comparison to help practitioners choose the most suitable solution for their scenarios.

Apache HudiApache IcebergComparison
0 likes · 17 min read
Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions
dbaplus Community
dbaplus Community
Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake
0 likes · 15 min read
Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi
Big Data Technology & Architecture
Big Data Technology & Architecture
Jan 7, 2020 · Big Data

Why Small Files Are a Problem in Big Data and How Delta Lake Compaction Solves It

This article examines the root causes and performance impact of massive small-file proliferation in traditional data warehouses, explains why HDFS metadata limits scalability, and details how Delta Lake’s custom compaction process can safely merge these files for append-only tables without disrupting reads or writes.

Delta LakeHDFSSmall Files
0 likes · 5 min read
Why Small Files Are a Problem in Big Data and How Delta Lake Compaction Solves It
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 17, 2019 · Big Data

Delta Lake: Architecture, Features, and Hands‑On Tutorial

This article explains the origins and motivations of Delta Lake, details its ACID transaction support, schema enforcement, metadata handling, versioning, and unified batch‑and‑stream processing, and provides a step‑by‑step Maven and Spark code tutorial for creating, updating, and querying Delta tables.

ACIDApache SparkBig Data
0 likes · 10 min read
Delta Lake: Architecture, Features, and Hands‑On Tutorial
Big Data Technology & Architecture
Big Data Technology & Architecture
Aug 5, 2019 · Big Data

Apache Spark Latest Technological Developments and Outlook for Spark 3.0+

The article provides a comprehensive overview of recent Apache Spark advancements—including Delta Lake, Data Source V2, runtime optimizations, relational cache, cloud‑native challenges, AI integration via Project Hydrogen, and the anticipated features of Spark 3.0—highlighting how these innovations address modern data‑warehouse, cloud, and machine‑learning workloads.

Apache SparkBig DataDelta Lake
0 likes · 17 min read
Apache Spark Latest Technological Developments and Outlook for Spark 3.0+
Liulishuo Tech Team
Liulishuo Tech Team
Jun 12, 2018 · Big Data

Highlights from Spark+AI Summit 2018: Hydrogen, MLflow, Delta, Spark 2.3, and Shuffle Optimization

The 2018 Spark+AI Summit in San Francisco showcased Spark's evolution toward unified AI and big‑data processing, introducing the Hydrogen project with gang scheduling, the open‑source MLflow platform, the Delta unified analytics engine, Spark 2.3 enhancements, and Facebook's shuffle I/O optimizations.

Delta LakeHydrogenShuffle Optimization
0 likes · 8 min read
Highlights from Spark+AI Summit 2018: Hydrogen, MLflow, Delta, Spark 2.3, and Shuffle Optimization