Tagged articles

117 articles

Page 2 of 2

Feb 25, 2022 · Big Data

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

ByteDance’s EMR team details how they integrated data‑lake engines such as Hudi and Iceberg into SparkSQL, streamlined jar management, built a custom Spark SQL Server with Hive compatibility, multi‑tenant support, engine pre‑warming, and transaction capabilities, dramatically improving performance and resource efficiency for enterprise workloads.

EMRHudiIceberg

0 likes · 11 min read

Optimizing SparkSQL: ByteDance EMR’s Data Lake Integration and Multi‑Tenant Server

Bilibili Tech

Feb 17, 2022 · Big Data

Bilibili's Lakehouse Architecture: Building a Unified Data Lake and Data Warehouse

Bilibili replaced its Hive‑Spark‑Presto ETL pipeline with a lakehouse built on Iceberg, using Magnus, Trino and Alluxio to unify a PB‑scale data lake and warehouse, adding Z‑Order sorting and indexing for fast multi‑dimensional queries while planning further schema and pre‑computation optimizations.

Data LakeData WarehouseIceberg

0 likes · 14 min read

Bilibili's Lakehouse Architecture: Building a Unified Data Lake and Data Warehouse

Big Data Technology Architecture

Nov 24, 2021 · Big Data

Using Iceberg Catalogs with HiveCatalog and HadoopCatalog: Table Creation, Data Ingestion, and Querying

This article explains the concept of Iceberg catalogs, compares HiveCatalog and HadoopCatalog, and provides step‑by‑step Spark examples for downloading the Iceberg jar, creating tables, loading data, querying, and examining the underlying metadata and directory structures.

HadoopCatalogHiveCatalogIceberg

0 likes · 15 min read

Using Iceberg Catalogs with HiveCatalog and HadoopCatalog: Table Creation, Data Ingestion, and Querying

HomeTech

Nov 17, 2021 · Big Data

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

This article details a lakehouse architecture built on Flink and Iceberg that addresses Hive‑based warehouse limitations by enabling ACID transactions, incremental snapshots, stream‑batch unification, CDC support, and various operational optimizations, ultimately achieving near real‑time data ingestion and analytics.

CDCFlinkIceberg

0 likes · 10 min read

Lakehouse Architecture Practice with Flink and Iceberg: Real‑time Data Ingestion and Management

Big Data Technology & Architecture

Oct 12, 2021 · Big Data

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

This article explores the evolution of data lakes, compares major cloud providers' lake architectures, introduces the emerging lakehouse concept, and provides a step‑by‑step Flink‑Iceberg implementation—including dependencies, catalog setup, table creation, checkpointing, and Kafka ingestion—demonstrating practical big‑data streaming solutions.

Data LakeFlinkIceberg

0 likes · 14 min read

Data Lake Evolution and a Practical Flink + Iceberg Implementation Guide

Big Data Technology Architecture

Aug 31, 2021 · Big Data

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

This article, compiled by community volunteers, examines various CDC data real‑time read/write solutions for data lake architectures, comparing offline HBase, Apache Kudu, Hive, Spark + Delta, and ultimately advocating Flink + Iceberg for efficient, correct, and scalable streaming ingestion and analytics.

CDCFlinkIceberg

0 likes · 18 min read

Real-time CDC Data Read/Write Solutions in Data Lake Architecture with Flink and Iceberg

Big Data Technology & Architecture

Aug 24, 2021 · Big Data

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

This article provides an in-depth overview of data lake concepts, definitions, and essential features, followed by detailed case studies of enterprise data lake implementations and comparative analysis of leading data lake table formats—Iceberg, Hudi, and Delta Lake—highlighting their architectures, capabilities, and trade‑offs.

Data LakeDelta LakeFlink

0 likes · 19 min read

Comprehensive Overview of Data Lake Technologies: Iceberg, Hudi, and Delta Lake

Big Data Technology Architecture

Jul 15, 2021 · Big Data

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

This article presents a comprehensive overview of using Apache Iceberg with object storage to construct scalable data lake solutions, covering lake architecture, Iceberg table organization, Flink‑based write and read workflows, catalog abstractions, object storage versus HDFS comparisons, append‑upload and atomic‑commit challenges, a demonstration setup, and ideas for storage optimization.

CatalogFlinkIceberg

0 likes · 16 min read

Building Data Lake Solutions with Iceberg and Object Storage: Architecture, Write/Read Processes, and Storage Optimization

DataFunTalk

Jun 21, 2021 · Big Data

Flink + Iceberg 0.11 Practices in Qunar Data Platform

This article shares Qunar's experience using Flink together with Apache Iceberg 0.11 to address real‑time data warehouse challenges, covering background pain points, Iceberg architecture, solutions for Kafka data loss and Hive latency, and optimization practices such as small‑file handling, sorting, and checkpoint management.

Big DataData LakeFlink

0 likes · 13 min read

Flink + Iceberg 0.11 Practices in Qunar Data Platform

Qunar Tech Salon

Jun 21, 2021 · Big Data

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

This article examines the challenges of using Kafka, Flink, and Hive for real‑time data warehousing, introduces Apache Iceberg 0.11 as a solution, details its architecture, query planning, Flink integration, code examples, optimization techniques, and summarizes the benefits for large‑scale data processing.

Big DataData LakeFlink

0 likes · 12 min read

Using Apache Iceberg 0.11 with Flink for Real‑time Data Lake: Architecture, Pain Points, and Solutions

Big Data Technology Architecture

May 31, 2021 · Big Data

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

This article presents Qunar's practical experience with Flink and Iceberg 0.11, covering background challenges such as Kafka data loss and Hive metadata pressure, explaining Iceberg architecture, query planning, and detailed solutions including real‑time ingestion, small‑file handling, sorting, and code examples for seamless migration.

FlinkIcebergReal-time Processing

0 likes · 12 min read

Practical Experience of Using Flink + Iceberg 0.11 on Qunar Data Platform

Tencent Cloud Developer

May 25, 2021 · Cloud Native

Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

The talk outlines a next‑generation cloud‑native data lake that leverages elastic Kubernetes compute, object‑storage, and Apache Iceberg to cut costs 3‑10× while boosting performance, and presents Tencent’s Data Lake Compute and Data Lake Fabric solutions that address scalability, reliability, and operational challenges through serverless, unified, multi‑engine architecture.

Cost OptimizationData LakeIceberg

0 likes · 13 min read

Next‑Generation Cloud‑Native Data Lake Architecture: Value, Principles, Challenges, and Tencent Solutions

Big Data Technology Architecture

Apr 5, 2021 · Big Data

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

The article reviews the current state of offline Hive‑based data warehouses, explains the emergence of real‑time data warehouses (1.0) built on Kafka and Flink, discusses their limitations, and outlines the progression toward batch‑stream unified architectures (2.0 and 3.0) leveraging data‑lake technologies such as Iceberg.

Batch-Stream IntegrationBig DataFlink

0 likes · 13 min read

Evolution of Real‑Time Data Warehouses: From 1.0 to 3.0 and the Road to Batch‑Stream Unified Architecture

Big Data Technology & Architecture

Mar 20, 2021 · Big Data

Understanding Data Lakes and a Comparative Overview of Iceberg, Hudi, and Delta Lake

This article explains what a data lake is, outlines its key characteristics, and compares three major data lake frameworks—Iceberg, Hudi, and Delta Lake—highlighting their architectures, features, and trade‑offs for large‑scale data storage and processing.

Data ArchitectureData LakeDelta Lake

0 likes · 13 min read

Understanding Data Lakes and a Comparative Overview of Iceberg, Hudi, and Delta Lake

DataFunTalk

Oct 9, 2020 · Big Data

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

This article examines the pain points of traditional data warehouse platforms, explains the core concepts and advantages of the Iceberg data lake table format, compares it with Metastore, reviews the current Iceberg community ecosystem, and details NetEase’s practical integration with Hive, Impala, and Flink to improve ETL efficiency and support unified batch‑stream processing.

Data LakeETLFlink

0 likes · 13 min read

NetEase’s Data Lake Iceberg: Challenges, Core Principles, and Practical Implementation

Big Data Technology Architecture

Feb 19, 2020 · Big Data

Comparative Analysis of Hudi, Iceberg, and Delta Lake for Data Lake Storage

This article compares three open‑source data‑lake storage layers—Hudi, Iceberg, and Delta Lake—examining their shared reliance on meta‑files for schema and transaction handling, and detailing their differing designs for upserts, streaming support, query performance, and ecosystem integration.

Delta LakeHudiIceberg

0 likes · 13 min read

Comparative Analysis of Hudi, Iceberg, and Delta Lake for Data Lake Storage

Big Data Technology & Architecture

Feb 6, 2020 · Big Data

Comparison of Hudi, Iceberg, and Delta Lake Table Formats

This article compares the design goals, data‑lake table formats—Hudi, Iceberg, and Delta—highlighting their common reliance on meta files and their distinct strengths for upserts, analytics, and unified streaming‑batch processing in modern big‑data environments.

Big DataData LakeDelta Lake

0 likes · 10 min read

Comparison of Hudi, Iceberg, and Delta Lake Table Formats