Tagged articles

Apache Iceberg

68 articles · Page 1 of 1

Jun 26, 2026 · Cloud Native

One-Click Real-Time Stream Ingestion: Alibaba Cloud Kafka’s Native Data Lake Integration

Alibaba Cloud Message Queue for Kafka introduces a native message‑to‑lake capability that integrates Apache Iceberg with OSS Table Bucket, eliminating Spark/Flink/Kafka Connect, providing exactly‑once semantics, automatic schema management, dual write modes, smart partitioning, and up to ten‑fold performance gains across diverse real‑time analytics scenarios.

Apache IcebergCloud NativeData Lake

0 likes · 12 min read

One-Click Real-Time Stream Ingestion: Alibaba Cloud Kafka’s Native Data Lake Integration

StarRocks

Jun 25, 2026 · Databases

StarRocks 4.1 Enables Faster Iceberg Queries While Preserving Data Freshness

StarRocks 4.1 introduces an incremental materialized view for Apache Iceberg that ties refresh cost to data changes instead of table size, dramatically cutting refresh time, maintaining low latency, and keeping query results fresh even as tables scale to terabytes or petabytes, with a fallback to partition refresh when needed.

Apache IcebergData FreshnessIncremental Materialized View

0 likes · 8 min read

StarRocks

Jun 4, 2026 · Databases

How StarRocks and Iceberg Enable Federated Queries: A Practical Walkthrough

This article details Fresha's real‑world integration of StarRocks with Apache Iceberg, covering metadata planning, distributed execution, adaptive metadata retrieval, hot‑cold data layering, missing statistics handling, catalog configuration, and performance optimizations that together demonstrate how federated queries can be efficiently executed over data‑lake tables.

Apache IcebergData LakeFederated Query

0 likes · 14 min read

How StarRocks and Iceberg Enable Federated Queries: A Practical Walkthrough

Past Memory Big Data

Apr 13, 2026 · Big Data

Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses

Apache Iceberg v3 introduces deletion vectors, row‑level lineage, a native VARIANT type, default column values, and nanosecond timestamps, delivering up to ten‑fold faster updates, native CDC, seamless semi‑structured data handling, and industry‑wide adoption that effectively ends the format war between lake and warehouse solutions.

Apache IcebergData LakehouseDefault Column Values

0 likes · 14 min read

Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses

StarRocks

Mar 5, 2026 · Big Data

How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse

Fanatics unified its fragmented data stack by building a StarRocks‑powered Lakehouse on Apache Iceberg, replacing Redshift, Snowflake, Athena, and Druid, which cut costs by up to 95%, delivered sub‑second dashboard queries on petabyte‑scale data, and enabled real‑time and historical analytics on a single platform.

Apache IcebergData ArchitectureFanatics

0 likes · 10 min read

How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse

DevOps Coach

Jan 25, 2026 · Operations

Why Infra Companies Are Racing Into Observability and What It Means for 2026

The article examines how SRE and infrastructure teams are converging, why major infra vendors are acquiring observability assets, the rising cost pressures, and how OpenTelemetry combined with Apache Iceberg forms a new standard stack that AI‑driven incident response will rely on in the coming years.

AI incident responseApache IcebergSRE

0 likes · 11 min read

Why Infra Companies Are Racing Into Observability and What It Means for 2026

DataFunSummit

Dec 1, 2025 · Big Data

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

This article collection showcases seven advanced data engineering solutions—from Tencent Cloud's Iceberg batch‑stream integration and Apache Gravitino metadata lineage to Xiaohongshu's Lakehouse evolution and multimodal AI data lake implementations—highlighting architectural innovations, performance optimizations, and real‑world deployment insights for modern big‑data platforms.

Apache GravitinoApache IcebergBatch-Stream Integration

0 likes · 7 min read

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

Past Memory Big Data

Jul 30, 2025 · Big Data

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

The article explains how Apache Iceberg v3 replaces the scalable‑limited positional‑delete mechanism in Merge‑on‑Read tables with compact Deletion Vectors, detailing the performance, I/O and metadata drawbacks of positional deletes and showing how the new bitmap‑based approach resolves them.

Apache IcebergData LakeDeletion Vector

0 likes · 20 min read

Why Iceberg Is Dropping Positional Deletes in Merge‑on‑Read Tables

Baidu Geek Talk

Jun 30, 2025 · Big Data

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

This article explains how Baidu’s next‑generation data platform Turing 3.0 integrates Apache Iceberg to solve the inefficiencies of the legacy MEG stack, detailing ecosystem components, migration strategies from Hive, table‑level optimizations, and future roadmap for high‑frequency, low‑latency analytics.

Apache IcebergData LakeHive Migration

0 likes · 17 min read

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

DataFunSummit

Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarLakehouse

0 likes · 13 min read

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

Alibaba Cloud Infrastructure

Mar 6, 2025 · Big Data

Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization

This article examines how Apache Iceberg’s snapshot‑based ACID transactions, logical‑physical partition evolution, and COW/MOR update modes enable efficient real‑time data lake ingestion, and demonstrates AutoMQ’s Kafka‑to‑Iceberg Table Topic solution that simplifies schema management, reduces latency, and cuts operational costs.

Apache IcebergAutoMQBig Data

0 likes · 14 min read

Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization

Past Memory Big Data

Dec 26, 2024 · Big Data

Eliminate Shuffle: Deep Dive into Spark’s Storage Partition Join (SPJ)

This article explains how Spark ≥ 3.3’s Storage Partition Join (SPJ) can avoid costly shuffle operations by using Iceberg tables, outlines the required table properties and Spark configurations, demonstrates the effect with code examples and execution plans, and explores several realistic join scenarios.

Apache IcebergBig DataSPJ

0 likes · 16 min read

Eliminate Shuffle: Deep Dive into Spark’s Storage Partition Join (SPJ)

DataFunSummit

Nov 20, 2024 · Artificial Intelligence

How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats

In a panel discussion, experts explain how data‑lake‑warehouse integration, columnar formats like Apache Iceberg, and emerging variant types enable efficient feature engineering, support large‑language‑model workloads, and provide flexible vector storage, thereby driving the evolution of AI from traditional ML to the GenAI era.

Apache IcebergData LakeGenAI

0 likes · 6 min read

How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats

DataFunTalk

Nov 6, 2024 · Big Data

How Data Lakes Empower AI: Insights from Industry Experts

In a panel discussion, experts from Kuaishou, Ping An, and Datastrato explain how data lake architectures, columnar storage formats like Apache Iceberg, and vector‑enabled lake formats are enhancing feature management, supporting generative AI workloads, and accelerating machine‑learning pipelines.

AIApache IcebergBig Data

0 likes · 6 min read

How Data Lakes Empower AI: Insights from Industry Experts

StarRocks

Sep 5, 2024 · Big Data

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

This tutorial walks you through the fundamentals of Apache Iceberg, its architecture and key features, explains why it’s advantageous for lakehouse workloads, and provides a step‑by‑step Docker‑Compose setup to integrate Iceberg with StarRocks for fast, ACID‑compliant analytics on real‑world taxi data.

Apache IcebergData EngineeringDocker

0 likes · 15 min read

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

DataFunTalk

Sep 4, 2024 · Artificial Intelligence

Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg

This article explores the evolution of data lakes for AI, discusses the challenges of AI-era data management, introduces Apache Iceberg and its architecture, demonstrates PyIceberg-based AI training and inference pipelines, and presents vector table designs with LSH indexing and performance optimizations.

AIApache IcebergBig Data

0 likes · 22 min read

Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg

StarRocks

Jul 24, 2024 · Big Data

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

The article examines the rapid rise of lakehouse architecture, its market momentum, core components—including storage, metadata, table formats, and compute layers—compares Iceberg, Hudi, and Delta Lake, discusses the shift from HDFS to object storage, and outlines the strategic importance of lakehouses for AI-driven data management and future data infrastructure trends.

AIApache IcebergBig Data

0 likes · 28 min read

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

DataFunSummit

Jun 20, 2024 · Big Data

Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions

This article presents a comprehensive overview of modern Data+AI data lake challenges and solutions, covering the evolution of data lakes, an introduction to Apache Iceberg, practical use of PyIceberg for AI training and inference pipelines, and advanced vector table and indexing techniques for efficient similarity search.

AI trainingApache IcebergBig Data

0 likes · 22 min read

Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions

DataFunSummit

Jun 5, 2024 · Big Data

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Databricks announced the acquisition of Tabular, the company founded by the original creators of Apache Iceberg, aiming to integrate Delta Lake and Iceberg into a unified, open lakehouse architecture that enhances format compatibility, reduces data silos, and supports AI workloads.

Apache IcebergBig DataDatabricks

0 likes · 5 min read

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Past Memory Big Data

Jun 5, 2024 · Industry Insights

Databricks Acquires Tabular, the Company Behind Apache Iceberg, to Boost Lakehouse Interoperability

Databricks announced its acquisition of Tabular, the creators of Apache Iceberg, aiming to unify lakehouse formats through Delta Lake UniForm, while highlighting the rise of lakehouse architecture, format fragmentation, and the push toward open data interoperability.

Apache IcebergData InteroperabilityDatabricks

0 likes · 9 min read

Databricks Acquires Tabular, the Company Behind Apache Iceberg, to Boost Lakehouse Interoperability

StarRocks

May 22, 2024 · Big Data

Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration

Apache Iceberg offers a modern, ACID‑compliant table format for data lakes with features like hidden partitions and schema evolution, while StarRocks provides high‑performance query acceleration, metadata caching, and distributed planning to address Iceberg’s latency challenges, enabling seamless lake‑warehouse integration and real‑time analytics.

Apache IcebergData LakeMetadata Caching

0 likes · 19 min read

Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration

Xiaohongshu Tech REDtech

Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse

0 likes · 19 min read

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

DataFunTalk

Jan 9, 2024 · Big Data

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

This article examines the design of lakehouse storage systems by comparing Delta Lake, Apache Hudi, and Apache Iceberg, focusing on metadata management, Merge‑On‑Read mechanisms, and a series of query and write performance optimizations with real‑world EMR case studies.

Apache HudiApache IcebergBig Data

0 likes · 16 min read

Analyzing Lakehouse Storage Systems: Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Hudi, and Iceberg

DataFunSummit

Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergData Management

0 likes · 12 min read

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

DataFunTalk

Nov 24, 2023 · Big Data

Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg

This article introduces Amoro, a lakehouse management platform built on Apache Iceberg, explains why Webex adopted it to overcome Hive limitations, details its AWS GlueCatalog and S3 integration with DynamoDB lock management, and provides step‑by‑step Helm‑based deployment instructions on Kubernetes.

AWSAmoroApache Iceberg

0 likes · 19 min read

Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg

DataFunTalk

Oct 5, 2023 · Big Data

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

This article describes how Shanghai Steel Union leveraged Amoro Mixed Iceberg on top of Apache Iceberg to create a unified streaming‑batch lakehouse, addressing small‑file and upsert challenges, simplifying architecture, improving data freshness, and providing a scalable solution for real‑time and batch analytics.

AmoroApache IcebergBig Data

0 likes · 13 min read

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

iQIYI Technical Product Team

Sep 22, 2023 · Big Data

Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform

iQIYI’s data‑middle‑platform team built a four‑zone data lake—raw, product, work, and sensitive—integrated with unified ODS/DWD/MID layers, a metadata catalog, and self‑service tools, leveraging HDFS, Hive/Iceberg, Spark/Trino, and Flink, migrated to Apache Iceberg for real‑time freshness, and now aims to further streamline modules and adopt new technologies.

Apache IcebergData GovernanceData Lake

0 likes · 13 min read

Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform

ITPUB

Aug 23, 2023 · Cloud Native

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

This guide explains the cloud‑native lakehouse concept, outlines its advantages and challenges, compares lake‑table projects such as Iceberg, and provides a step‑by‑step AWS deployment of Apache Iceberg and Amoro—including environment setup, AMS installation, catalog configuration, optimizer launch, data ingestion with Flink, and query verification with Spark.

AWSAmoroApache Iceberg

0 likes · 33 min read

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

DataFunTalk

Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data

0 likes · 18 min read

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

dbaplus Community

Jun 6, 2023 · Big Data

Why Data Lakes Are Transforming Big Data: Concepts, Benefits, and Iceberg in Practice

This article explains the evolution of data lakes, compares public‑cloud and private‑cloud implementations, outlines key technical features, presents three real‑world scenarios, details the selection and inner workings of Apache Iceberg versus Hive, and showcases multiple production use cases at iQIYI.

Apache IcebergBatch ProcessingBig Data

0 likes · 25 min read

Why Data Lakes Are Transforming Big Data: Concepts, Benefits, and Iceberg in Practice

DataFunTalk

May 11, 2023 · Big Data

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

This article describes how ByteDance tackled petabyte‑scale feature storage by adopting Apache Iceberg, detailing the problem background, design choices, implementation of COW and MOR back‑fill strategies, performance optimizations, and future plans such as lake‑cold‑layering and materialized views.

Apache IcebergBig DataData Lake

0 likes · 16 min read

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

DataFunSummit

Apr 30, 2023 · Big Data

Arctic: Efficient Management of Apache Iceberg Lakehouse Tables – Concepts, Practices, and Roadmap

This article introduces the Arctic lakehouse management system built on Apache Iceberg, explains Iceberg’s core principles, format versions, and real‑world implementations at NetEase, and details Arctic’s automated table optimization, governance workflows, and future development plans.

Apache IcebergArcticData Governance

0 likes · 22 min read

Arctic: Efficient Management of Apache Iceberg Lakehouse Tables – Concepts, Practices, and Roadmap

StarRing Big Data Open Lab

Mar 22, 2023 · Big Data

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

This article explains how the lakehouse integrated architecture combines data lake and data warehouse capabilities, outlines its key features, compares three implementation paths, and provides an in‑depth technical overview of Apache Hudi and Apache Iceberg for modern big‑data analytics.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Why Lakehouse Architecture Is Revolutionizing Data Analytics: Hudi vs Iceberg

iQIYI Technical Product Team

Feb 3, 2023 · Big Data

Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI

iQIYI’s data lake combines public‑cloud and private storage with Apache Iceberg’s snapshot‑based table format to enable near‑real‑time, unified batch‑and‑stream analytics, reducing costs, simplifying architecture, and improving data freshness across use cases such as log collection, audit, pingback, and member order processing.

Apache IcebergData ArchitectureData Lake

0 likes · 25 min read

Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI

DataFunTalk

Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticData Lake

0 likes · 12 min read

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

DataFunSummit

Oct 29, 2022 · Big Data

Apache Iceberg in Tencent: Architecture, Spark Read/Write, Production Practices, and Data Governance

This article presents an in‑depth overview of Apache Iceberg as used at Tencent, covering its table format architecture, Spark read/write mechanisms, production challenges and optimizations such as schema evolution, file filtering, upsert strategies, and the surrounding data‑governance services.

Apache IcebergBig DataData Governance

0 likes · 19 min read

Apache Iceberg in Tencent: Architecture, Spark Read/Write, Production Practices, and Data Governance

NetEase Cloud Music Tech Team

Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticBig Data Architecture

0 likes · 15 min read

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

DataFunSummit

Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC

0 likes · 16 min read

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

DataFunSummit

Sep 5, 2022 · Big Data

DataFun Summit 2022 – Modern Data Stack Forum: Speaker Lineup and Session Overviews

The DataFun Summit 2022 featured a Data Lake & Warehouse forum with expert talks on PALO, ByteDance LAS, Iceberg at Huawei, and Presto‑Alluxio acceleration, providing detailed technical outlines, speaker backgrounds, and audience takeaways for modern big‑data architectures.

Apache IcebergBig DataData Lake

0 likes · 7 min read

DataFun Summit 2022 – Modern Data Stack Forum: Speaker Lineup and Session Overviews

Big Data Technology Architecture

Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control

0 likes · 18 min read

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

DataFunTalk

Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergCDCFlink

0 likes · 16 min read

Big Data Technology & Architecture

Jul 7, 2022 · Big Data

Deep Dive into Apache Iceberg Core Features and Flink Integration

This article explains Apache Iceberg’s architecture, core capabilities such as time‑travel, fast scans, delete handling, and schema evolution, and provides a step‑by‑step guide for configuring Flink to use Iceberg with Hive and Hadoop catalogs, including DDL commands and streaming queries.

Apache IcebergBig DataData Lake

0 likes · 22 min read

Deep Dive into Apache Iceberg Core Features and Flink Integration

Big Data Technology & Architecture

Jul 6, 2022 · Big Data

Understanding Apache Iceberg File Storage Format and Write Processes in Spark and Flink

This article explains the Apache Iceberg file storage format, its metadata hierarchy, and demonstrates how Spark and Flink write data to Iceberg tables, including detailed code examples, manifest handling, snapshot management, and commit processes for efficient data lake operations.

Apache IcebergBig DataData Lake

0 likes · 31 min read

Understanding Apache Iceberg File Storage Format and Write Processes in Spark and Flink

DataFunSummit

Apr 29, 2022 · Big Data

Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization

This article explains how Apache Iceberg’s DataSkipping technique can lose efficiency when many filter columns are used, and presents a data‑organization optimization using space‑filling curves and Z‑Order to improve query I/O, details the OPTIMIZE implementation, and shares performance benchmark results and future plans.

Apache IcebergBig DataData Skipping

0 likes · 12 min read

Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization

DataFunTalk

Apr 9, 2022 · Big Data

Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

This talk explains how Apache Iceberg’s DataSkipping can lose efficiency with many filter columns, and presents a data‑organization redesign using space‑filling curves and Z‑Order to improve query I/O, detailing the OPTIMIZE syntax, implementation steps, performance benchmarks, and future roadmap.

Apache IcebergBig DataData Skipping

0 likes · 12 min read

Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

DataFunTalk

Mar 1, 2022 · Cloud Native

Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions

The presentation outlines Alibaba Cloud's native data lake solution built on Apache Iceberg, covering data lake fundamentals, cloud migration challenges, Iceberg's architecture and features, real‑time ingestion with Flink, unified metadata management, security guarantees, and testing practices to ensure reliable, scalable big‑data analytics.

Apache IcebergBig DataData Lake

0 likes · 16 min read

Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions

DataFunTalk

Feb 25, 2022 · Big Data

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

This article explains how Tencent leverages Apache Iceberg together with Flink to build a real‑time data lake pipeline, covering data ingestion, Iceberg's snapshot‑based read/write model, compaction and governance services, Z‑order based query optimization, performance results, and future roadmap.

Apache IcebergBig DataCompaction

0 likes · 24 min read

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

DataFunTalk

Feb 12, 2022 · Big Data

NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap

This article introduces NetEase's internally incubated data lake project Arctic, explains the concept of data lakes, outlines NetEase's specific requirements for a unified streaming‑batch platform, details Arctic's core architecture, storage strategy, data‑merge mechanisms, current achievements, and future development plans.

Apache IcebergArcticBig Data

0 likes · 10 min read

NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap

DataFunTalk

Jan 8, 2022 · Big Data

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

This article provides a comprehensive overview of the Lakehouse paradigm, tracing its origins from traditional data warehouses and data lakes, comparing architectures, detailing core components such as Delta Lake and Iceberg, and illustrating practical cloud implementations and future directions.

Apache IcebergBig DataCloud Data Platform

0 likes · 14 min read

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

Big Data Technology & Architecture

Nov 8, 2021 · Big Data

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

This article examines the strengths and weaknesses of Apache Iceberg, explains why Tencent selected it over alternatives, details Tencent’s own enhancements and integration with Flink, Spark, and other engines, and shares multiple real‑world implementations for building enterprise‑grade real‑time data lakes.

Apache IcebergData LakeFlink

0 likes · 17 min read

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

Big Data Technology Architecture

Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergBatch Processing

0 likes · 20 min read

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

DataFunTalk

Jul 10, 2021 · Big Data

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

This article explains how to construct a lake‑house architecture using Apache Iceberg, detailing the migration from Hive, Flink‑SQL integration, proxy user support, CDC handling, copy‑on‑write sinks, and the resulting benefits for near‑real‑time data visibility and unified batch‑stream processing.

Apache IcebergCDCFlink

0 likes · 10 min read

Building a Lakehouse Architecture with Apache Iceberg and Flink: Practices and Insights

Big Data Technology & Architecture

Jun 16, 2021 · Big Data

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

This article reviews the advantages of Apache Iceberg for data lake storage, details Tencent’s custom optimizations and integration with Flink and Spark, and shares multiple real‑world implementations that demonstrate how Iceberg improves data consistency, reduces small‑file overhead, and enables near‑real‑time analytics in large‑scale big‑data environments.

Apache IcebergData LakeFlink

0 likes · 18 min read

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

Big Data Technology Architecture

Jun 10, 2021 · Big Data

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

This article explains Apache Iceberg’s table‑format design, compares it with Hive’s limitations, details its snapshot‑based architecture and metadata handling, and describes how NetEase Cloud Music leveraged Iceberg to dramatically improve large‑scale log processing performance and stability.

Apache IcebergSparkmetadata management

0 likes · 12 min read

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

dbaplus Community

Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data

0 likes · 14 min read

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

DataFunTalk

Apr 26, 2021 · Big Data

Detailed Design and Practical Application of Apache Iceberg at NetEase Cloud Music

This article explains the motivations behind Apache Iceberg, its design principles such as snapshot and MVCC, compares it with Hive, and describes how NetEase Cloud Music adopted Iceberg to improve metadata handling, query performance, and operational stability for massive daily log data.

Apache IcebergBig DataData Lake

0 likes · 13 min read

Detailed Design and Practical Application of Apache Iceberg at NetEase Cloud Music

DataFunTalk

Apr 18, 2021 · Big Data

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

This article compares Apache Hudi, Apache Iceberg, and Delta Lake, examining their storage formats, platform compatibility, update performance, concurrency guarantees, and integration with lakeFS to help readers choose the most suitable solution for their data lake use case.

Apache HudiApache IcebergDelta Lake

0 likes · 16 min read

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

Big Data Technology Architecture

Apr 5, 2021 · Big Data

Understanding Apache Iceberg: Table Format Architecture, Comparison with Hive Metastore, and Business Benefits

This article introduces Apache Iceberg as an open table format for massive analytic datasets, explains its underlying concepts such as schema, partitioning, statistics, and read/write APIs, compares it with Hive Metastore, outlines its ACID commit process, highlights the performance and operational advantages for big‑data workloads, and previews upcoming community features.

ACIDApache IcebergMetadata

0 likes · 19 min read

Understanding Apache Iceberg: Table Format Architecture, Comparison with Hive Metastore, and Business Benefits

DataFunTalk

Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC

0 likes · 13 min read

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

DataFunTalk

Feb 14, 2021 · Big Data

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

This talk presents NetEase's practical experience with Impala, covering its core architecture, new features in version 3.x, integration with Apache Iceberg, a custom management platform, profiling and statistics enhancements, as well as future plans involving Kubernetes, Alluxio caching and pre‑computation strategies.

Apache IcebergBig DataImpala

0 likes · 13 min read

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

Big Data Technology & Architecture

Feb 2, 2021 · Big Data

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

This article provides a comprehensive overview of Apache Iceberg, covering its origins, key features, practical Spark and Flink code examples, notable deployments at Alibaba and Tencent, and its future role as a universal table format for big‑data analytics.

Apache IcebergData LakeFlink

0 likes · 9 min read

An Introduction to Apache Iceberg: Features, Spark & Flink Integration, and Real‑World Use Cases

DataFunTalk

Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data

0 likes · 21 min read

Youzan Coder

Dec 21, 2020 · Big Data

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

At Youzan’s Big Data Technology Salon, over 100 attendees heard leaders from Youzan, NetEase Yishu, and Didi discuss cost governance, Apache Iceberg data lakes, large‑scale Flink real‑time computing, and data‑driven growth strategies, highlighting practical implementations, savings of millions and tools for merchant empowerment.

Apache IcebergData GrowthFlink

0 likes · 5 min read

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

Youzan Coder

Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance

0 likes · 5 min read

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

DataFunTalk

Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink

0 likes · 13 min read

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

Big Data Technology Architecture

Nov 27, 2020 · Big Data

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

This article explains how Apache Flink combines with Apache Iceberg to build unified stream‑batch data lake solutions, covering data lake fundamentals, architectural layers, classic business scenarios, reasons for choosing Iceberg, streaming ingestion design, and upcoming community enhancements.

Apache FlinkApache Icebergtable format

0 likes · 13 min read

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

Big Data Technology Architecture

Mar 24, 2020 · Big Data

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

This article examines the three leading open‑source data‑lake projects—Delta Lake, Apache Iceberg, and Apache Hudi—by outlining their origins, core problems they address, key features, and a detailed seven‑dimension comparison to help practitioners choose the most suitable solution for their scenarios.

Apache HudiApache IcebergComparison

0 likes · 17 min read

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

dbaplus Community

Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake

0 likes · 15 min read

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi