Tag

Apache Iceberg

1 views collected around this technical thread.

DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarBig Data
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 6, 2025 · Big Data

Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization

This article examines how Apache Iceberg’s snapshot‑based ACID transactions, logical‑physical partition evolution, and COW/MOR update modes enable efficient real‑time data lake ingestion, and demonstrates AutoMQ’s Kafka‑to‑Iceberg Table Topic solution that simplifies schema management, reduces latency, and cuts operational costs.

Apache IcebergAutoMQBig Data
0 likes · 14 min read
Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization
DataFunSummit
DataFunSummit
Nov 20, 2024 · Artificial Intelligence

How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats

In a panel discussion, experts explain how data‑lake‑warehouse integration, columnar formats like Apache Iceberg, and emerging variant types enable efficient feature engineering, support large‑language‑model workloads, and provide flexible vector storage, thereby driving the evolution of AI from traditional ML to the GenAI era.

Apache IcebergArtificial IntelligenceFeature Engineering
0 likes · 6 min read
How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats
DataFunTalk
DataFunTalk
Nov 6, 2024 · Big Data

How Data Lakes Empower AI: Insights from Industry Experts

In a panel discussion, experts from Kuaishou, Ping An, and Datastrato explain how data lake architectures, columnar storage formats like Apache Iceberg, and vector‑enabled lake formats are enhancing feature management, supporting generative AI workloads, and accelerating machine‑learning pipelines.

AIApache IcebergBig Data
0 likes · 6 min read
How Data Lakes Empower AI: Insights from Industry Experts
DataFunTalk
DataFunTalk
Sep 4, 2024 · Artificial Intelligence

Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg

This article explores the evolution of data lakes for AI, discusses the challenges of AI-era data management, introduces Apache Iceberg and its architecture, demonstrates PyIceberg-based AI training and inference pipelines, and presents vector table designs with LSH indexing and performance optimizations.

AIApache IcebergBig Data
0 likes · 22 min read
Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg
DataFunSummit
DataFunSummit
Jun 20, 2024 · Big Data

Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions

This article presents a comprehensive overview of modern Data+AI data lake challenges and solutions, covering the evolution of data lakes, an introduction to Apache Iceberg, practical use of PyIceberg for AI training and inference pipelines, and advanced vector table and indexing techniques for efficient similarity search.

AI trainingApache IcebergBig Data
0 likes · 22 min read
Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions
DataFunSummit
DataFunSummit
Jun 5, 2024 · Big Data

Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse

Databricks announced the acquisition of Tabular, the company founded by the original creators of Apache Iceberg, aiming to integrate Delta Lake and Iceberg into a unified, open lakehouse architecture that enhances format compatibility, reduces data silos, and supports AI workloads.

Apache IcebergBig DataDelta Lake
0 likes · 5 min read
Databricks Acquires Tabular to Unite Delta Lake and Apache Iceberg for an Open Lakehouse
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergBig DataData Warehouse
0 likes · 19 min read
Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations
DataFunSummit
DataFunSummit
Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergLakehouse
0 likes · 12 min read
Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro
DataFunTalk
DataFunTalk
Nov 24, 2023 · Big Data

Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg

This article introduces Amoro, a lakehouse management platform built on Apache Iceberg, explains why Webex adopted it to overcome Hive limitations, details its AWS GlueCatalog and S3 integration with DynamoDB lock management, and provides step‑by‑step Helm‑based deployment instructions on Kubernetes.

AWSAmoroApache Iceberg
0 likes · 19 min read
Amoro Lakehouse Management System: Deployment Practices and AWS Integration for Apache Iceberg
DataFunTalk
DataFunTalk
Oct 5, 2023 · Big Data

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

This article describes how Shanghai Steel Union leveraged Amoro Mixed Iceberg on top of Apache Iceberg to create a unified streaming‑batch lakehouse, addressing small‑file and upsert challenges, simplifying architecture, improving data freshness, and providing a scalable solution for real‑time and batch analytics.

AmoroApache IcebergBig Data
0 likes · 13 min read
Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 22, 2023 · Big Data

Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform

iQIYI’s data‑middle‑platform team built a four‑zone data lake—raw, product, work, and sensitive—integrated with unified ODS/DWD/MID layers, a metadata catalog, and self‑service tools, leveraging HDFS, Hive/Iceberg, Spark/Trino, and Flink, migrated to Apache Iceberg for real‑time freshness, and now aims to further streamline modules and adopt new technologies.

Apache IcebergBig DataFlink
0 likes · 13 min read
Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
May 11, 2023 · Big Data

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

This article describes how ByteDance tackled petabyte‑scale feature storage by adopting Apache Iceberg, detailing the problem background, design choices, implementation of COW and MOR back‑fill strategies, performance optimizations, and future plans such as lake‑cold‑layering and materialized views.

Apache IcebergBig DataFeature Store
0 likes · 16 min read
Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap
iQIYI Technical Product Team
iQIYI Technical Product Team
Feb 3, 2023 · Big Data

Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI

iQIYI’s data lake combines public‑cloud and private storage with Apache Iceberg’s snapshot‑based table format to enable near‑real‑time, unified batch‑and‑stream analytics, reducing costs, simplifying architecture, and improving data freshness across use cases such as log collection, audit, pingback, and member order processing.

Apache IcebergBig DataReal-time Analytics
0 likes · 25 min read
Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI
DataFunTalk
DataFunTalk
Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticFlink
0 likes · 12 min read
Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticHive Compatibility
0 likes · 15 min read
Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice