Tagged articles

Iceberg

124 articles · Page 1 of 2

Jul 2, 2026 · Industry Insights

How ColdFront Sets pgEdge Apart in the OLTP‑OLAP‑AI Showdown

The article compares four emerging data‑lake‑for‑PostgreSQL solutions—Databricks LTAP, EDB Fusion Analytics, Snowflake pg_lake, and pgEdge's ColdFront—highlighting ColdFront's unique transparent Iceberg layer, writable cold data, DuckDB integration, and the strategic trade‑offs developers must weigh when choosing a modern OLTP/OLAP/AI architecture.

Agentic AIColdFrontData Lake

0 likes · 9 min read

How ColdFront Sets pgEdge Apart in the OLTP‑OLAP‑AI Showdown

Alibaba Cloud Native

Jun 19, 2026 · Big Data

Why Real-Time Data Lake Ingestion Is Dropping ETL in the AI Era: Architecture Simplification from Kafka to Iceberg

In the AI‑driven era, enterprises need a data foundation that supports both real‑time consumption and long‑term historical analysis, and the emerging "zero‑ETL" trend moves generic ingestion capabilities from external Flink/Spark jobs into a streamlined Kafka‑to‑Iceberg pipeline, reducing complexity while preserving low latency, consistency, schema evolution, CDC semantics and open‑ecosystem compatibility.

Data LakeIcebergStreaming

0 likes · 25 min read

Why Real-Time Data Lake Ingestion Is Dropping ETL in the AI Era: Architecture Simplification from Kafka to Iceberg

Alibaba Cloud Developer

Jun 18, 2026 · Big Data

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

In the AI era, enterprises need a data foundation that supports both low‑latency streaming and long‑term analytics, and the combination of Kafka, Iceberg and object storage is emerging as a preferred solution; by moving ingestion capabilities closer to the message layer and eliminating external ETL jobs, a "zero‑ETL" approach reduces architectural complexity, improves consistency, and streamlines schema evolution and small‑file management.

CDCData LakeIceberg

0 likes · 27 min read

How AI-Driven Real-Time Data Lakes Are Ditching ETL: A Kafka‑to‑Iceberg Architecture Simplification

StarRocks

Jun 17, 2026 · Databases

How StarRocks 4.1 Simplifies Operations and Boosts Production Performance

StarRocks 4.1 introduces automatic multi‑tenant data management, large‑capacity tablets, second‑level schema evolution, enhanced cache observability, and deeper Iceberg support, addressing static data distribution, data skew, high repair costs and expertise requirements while delivering up to 1.86× higher throughput and dramatically lower latency in production workloads.

Cache ObservabilityData DistributionFast Schema Evolution

0 likes · 13 min read

How StarRocks 4.1 Simplifies Operations and Boosts Production Performance

Big Data Technology & Architecture

Jun 16, 2026 · Big Data

Deep Dive: Multimodal Data Lake Formats – Paimon vs. Hudi vs. Iceberg

This article analytically compares three open table‑format projects—Paimon, Hudi, and Iceberg—examining how each addresses multimodal data lake challenges such as massive volume, sparse access patterns, and combined scalar‑vector retrieval, and provides concrete feature breakdowns and selection guidance.

BLOBHudiIceberg

0 likes · 11 min read

Deep Dive: Multimodal Data Lake Formats – Paimon vs. Hudi vs. Iceberg

DataFunTalk

Mar 3, 2026 · Big Data

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

This article presents a series of seven technical case studies—including Tencent Cloud’s Iceberg‑based batch‑stream integration, AI‑driven data governance with Apache Gravitino, Xiaohongshu’s lakehouse evolution, and a multimodal data‑lake solution—detailing challenges, architectural designs, implementation steps, performance results, and future directions.

AIBig DataData Lake

0 likes · 8 min read

Exploring Tencent Cloud’s Iceberg Batch‑Stream Integration and AI‑Driven Data Governance

Past Memory Big Data

Dec 1, 2025 · Big Data

Apache XTable: A Universal Translator for Data Lake Format Interoperability

Apache XTable introduces a lightweight metadata translation layer that decouples data storage from format metadata, enabling zero‑copy, omni‑directional conversion among Hudi, Iceberg, and Delta Lake, allowing organizations to write with one format and read with any engine without duplicating Parquet files.

Apache XTableData LakeDelta Lake

0 likes · 7 min read

Apache XTable: A Universal Translator for Data Lake Format Interoperability

DataFunSummit

Nov 24, 2025 · Big Data

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

This article series explores Tencent Cloud's Iceberg‑based batch‑stream integration, Apache Gravitino's unified metadata and lineage solution, Xiaohongshu's data‑architecture evolution for the Big AI Data era, and a practical Data+AI multimodal data‑lake implementation, highlighting challenges, architectural designs, and performance gains.

Big DataData LakeIceberg

0 likes · 7 min read

How Tencent Cloud Uses Iceberg, Gravitino and Multimodal Lakes for Unified Data Processing

DataFunTalk

Nov 22, 2025 · Big Data

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

This article collection examines Tencent Cloud’s Iceberg batch‑stream integration, AI‑driven game data governance, Apache Gravitino unified metadata and lineage, Xiaohongshu’s multimodal data‑lake evolution, and Volcano Engine’s Data+AI multimodal lake, highlighting architectures, techniques, performance gains, and practical implementations.

AI GovernanceData LakeGravitino

0 likes · 7 min read

How Modern Data Lakes and AI Governance Transform Enterprise Analytics

DataFunTalk

Sep 6, 2025 · Big Data

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

Xiaomi’s data‑lake team explains how they tackled small‑file issues, unified metadata with Gravitino, migrated Hive to Iceberg and Fileset, leveraged JuiceFS for multi‑cloud storage, and combined Iceberg and Paimon to achieve cost‑effective, high‑performance batch and real‑time analytics.

Big DataCloud NativeData Lake

0 likes · 13 min read

How Xiaomi Cuts Costs and Boosts Efficiency with a Cloud‑Native Lakehouse Architecture

iQIYI Technical Product Team

Aug 7, 2025 · Big Data

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Facing growing data demands in finance, we replaced two legacy synchronization pipelines with a unified, low‑latency architecture using BabelX Real‑Time, Flink CDC, Iceberg v2 and Paimon, achieving minute‑level data freshness, ten‑to‑thirty‑fold query speedups, reduced storage costs, and streamlined schema management across multiple business units.

Big DataFlinkIceberg

0 likes · 12 min read

Building a Low‑Latency, High‑Capacity Real‑Time Data Platform for Finance

Big Data Technology & Architecture

Jul 1, 2025 · Big Data

What’s New in Apache Hive 4.0? Key Features and Industry Outlook

After a weekend dive into Apache Hive’s official Wiki and GitHub, this article highlights Hive’s declining visibility compared to Spark and Flink, examines its 4.0 release’s major features—including Iceberg integration, enhanced ACID, cost‑based optimizer upgrades, and Ozone support—while reflecting on its role in modern data ecosystems.

Apache HiveBig DataData Warehouse

0 likes · 4 min read

What’s New in Apache Hive 4.0? Key Features and Industry Outlook

Big Data Technology & Architecture

Jun 10, 2025 · Big Data

Transforming Real‑Time Analytics: Incremental Computing with Lakehouse Architecture

This article examines how Xiaohongshu replaced its costly Lambda architecture with a real‑time lakehouse built on Iceberg, Paimon, Spark, and StarRocks, achieving minute‑level latency, higher data quality, lower resource consumption, and dramatically faster query performance.

Big Data ArchitectureIcebergLakehouse

0 likes · 7 min read

Transforming Real‑Time Analytics: Incremental Computing with Lakehouse Architecture

Xiaohongshu Tech REDtech

May 19, 2025 · Industry Insights

How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing

Facing billions of daily logs and the need for minute‑level experiment metrics, Xiaohongshu partnered with Yunqi Tech to design a generic incremental‑compute solution that delivers near‑real‑time data warehousing with lower cost, higher accuracy, simplified pipelines, and improved query performance.

Big DataData LakeFlink

0 likes · 24 min read

How Xiaohongshu Built a Minute‑Level Near‑Real‑Time Data Warehouse with Incremental Computing

DataFunSummit

May 4, 2025 · Big Data

Iceberg Table Format Practice in Huawei Terminal Cloud

This article explains how Huawei's terminal cloud adopts the Apache Iceberg table format to efficiently manage large-scale datasets, detailing its architecture, feature engineering, merge operations, LSM-based storage, schema versioning, AB testing support, catalog enhancements, and future roadmap for full lifecycle data governance.

Big DataData LakeHuawei Cloud

0 likes · 13 min read

Iceberg Table Format Practice in Huawei Terminal Cloud

Big Data Technology & Architecture

Apr 29, 2025 · Big Data

Big Data Interview Preparation: Data Governance, Iceberg Metadata, Lakehouse Best Practices, and Xiaohongshu HR Updates

The article reports Xiaohongshu’s cancellation of the big‑small week schedule and non‑compete clause, then provides a collection of big‑data interview questions—including data governance, Iceberg metadata management, and lakehouse production best practices—along with concise answers and resources for candidates.

Data GovernanceIcebergLakehouse

0 likes · 7 min read

Big Data Interview Preparation: Data Governance, Iceberg Metadata, Lakehouse Best Practices, and Xiaohongshu HR Updates

iQIYI Technical Product Team

Mar 27, 2025 · Big Data

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

iQIYI transformed its real‑time data warehouse by replacing a costly Kafka‑based Lambda stack with a unified stream‑batch Iceberg lake, cutting storage expenses by 90%, halving compute costs, extending data retention, and delivering minute‑level freshness for 90% of use cases while preserving second‑level processing where needed.

FlinkIcebergReal-Time Data Warehouse

0 likes · 11 min read

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

StarRocks

Feb 20, 2025 · Big Data

How RedBI Boosted Query Speed 3× with StarRocks & Iceberg Lakehouse

The article details how Xiaohongshu's RedBI self‑service analytics platform transformed its architecture by integrating StarRocks and Iceberg, replacing ClickHouse‑based storage with Parquet, introducing DataCache, Z‑Order sorting and intelligent key selection, achieving a three‑fold P90 query speed improvement, sub‑10‑second latency, and halving storage consumption.

DataCacheIcebergLakehouse

0 likes · 19 min read

How RedBI Boosted Query Speed 3× with StarRocks & Iceberg Lakehouse

DataFunSummit

Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataFlink

0 likes · 12 min read

Tencent Real-Time Lakehouse Intelligent Optimization Practice

DataFunSummit

Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataFlinkIceberg

0 likes · 11 min read

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

DataFunSummit

Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineCompactionFlink

0 likes · 11 min read

Bilibili Tech

Dec 27, 2024 · Big Data

Consistency Architecture for Bilibili Recommendation Model Data Flow

The article outlines Bilibili’s revamped recommendation data‑flow architecture that eliminates timing and calculation inconsistencies by snapshotting online features, unifying feature computation in a single C++ library accessed via JNI, and orchestrating label‑join and sample extraction through near‑line Kafka/Flink pipelines, with further performance gains and Iceberg‑based future extensions.

Data ConsistencyFlinkIceberg

0 likes · 12 min read

Consistency Architecture for Bilibili Recommendation Model Data Flow

dbaplus Community

Dec 24, 2024 · Big Data

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

The article details Bilibili's comprehensive redesign of its tag system—including background challenges, architectural layers, technical upgrades like Iceberg integration and shard‑based ClickHouse writes, crowd selection methods, online service guarantees, performance metrics, and future plans—showcasing a data‑driven solution that boosts stability, speed, and business coverage.

ClickHouseData EngineeringDistributed Computing

0 likes · 24 min read

How Bilibili Scaled Its Tag System for Massive Data and Real‑Time Accuracy

Bilibili Tech

Dec 17, 2024 · Big Data

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Bilibili adopted Apache Gravitino as a unified metadata platform that decouples consumers, consolidates schemas and Fileset‑based unstructured data across heterogeneous sources, cuts metadata and storage costs, resolves inconsistencies, boosts Hive Metastore performance, and enables features such as Iceberg branching and future AI‑centric governance.

Apache GravitinoBig DataFileset

0 likes · 20 min read

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Tencent Advertising Technology

Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction

0 likes · 25 min read

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Bilibili Tech

Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

BatchData IntegrationData Lake

0 likes · 20 min read

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

DataFunSummit

Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Batch ProcessingData LakeFlink

0 likes · 20 min read

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

DataFunSummit

Nov 12, 2024 · Big Data

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

The article summarizes a roundtable discussion where experts compare four lake‑warehouse architectural patterns, explain their suitability for different business scenarios, contrast them with traditional data warehouses, and highlight practical considerations for choosing and evolving data platforms.

HudiIcebergLakehouse Architecture

0 likes · 6 min read

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

Bilibili Tech

Nov 12, 2024 · Big Data

Scalable Tag System Architecture and Optimization

The rebuilt tag system introduces a three‑layer architecture, standard pipelines, Iceberg‑backed storage and custom ClickHouse sharding, a DSL for crowd selection, and a stateless online service, achieving 99.9% success, sub‑5 ms latency, and supporting thousands of tags across dozens of business scenarios while planning real‑time processing and automated lifecycle management.

ClickHouseIcebergOnline Service

0 likes · 23 min read

Scalable Tag System Architecture and Optimization

DataFunSummit

Nov 8, 2024 · Big Data

Roundtable Discussion on Data Lake Technology Maturity and Governance Practices

Experts from Kuaishou, former Tencent, Ping An Insurance and others discuss data lake maturity, column‑level governance, resource management of unstructured data, and automated optimization techniques such as Iceberg small‑file merging, highlighting how these advances improve data quality and business decision‑making.

Big DataColumn-level GovernanceData Lake

0 likes · 6 min read

Roundtable Discussion on Data Lake Technology Maturity and Governance Practices

DataFunSummit

Nov 5, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real-time lakehouse architecture, detailing its three-layer design, the Auto Optimize Service with compaction, indexing, clustering and engine acceleration, scenario capabilities such as multi‑stream joins and in‑place migration, and outlines future optimization directions.

IcebergTencent

0 likes · 11 min read

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

Bilibili Tech

Nov 1, 2024 · Big Data

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

Magnus is Bilibili’s self‑developed intelligent service that continuously optimizes Iceberg tables by scheduling snapshot expiration, orphan‑file cleanup, manifest rewriting, and multi‑dimensional data optimizations—including small‑file merging, sorting, distribution, and index creation—while automatically recommending configurations from real‑time query logs, delivering over 99.9% task success and up to 30% scan‑data reduction.

Data LakeIcebergIntelligent Recommendation

0 likes · 15 min read

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

Bilibili Tech

Oct 25, 2024 · Big Data

DataFunSummit2024: Next-Generation Data Architecture Technology Summit

DataFunSummit2024, co-hosted by Bilibili, convenes industry experts, scholars, and enterprise leaders across six forums to discuss next‑generation data architecture, showcasing Bilibili’s Iceberg‑based stream‑batch innovations, AI‑BI analytics, NoETL practices, and emerging alternatives to Lambda architecture.

AI+BIBig DataData Architecture

0 likes · 3 min read

DataFunSummit2024: Next-Generation Data Architecture Technology Summit

DataFunTalk

Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake

0 likes · 10 min read

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

DataFunSummit

Sep 27, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the data lake technology maturity curve, covering lake‑warehouse architecture patterns, design principles, core capabilities of major open‑source lake engines (Hudi, Iceberg, Delta Lake, Paimon), and practical application scenarios for modern data‑driven enterprises.

Big DataData LakeDelta Lake

0 likes · 10 min read

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

DataFunTalk

Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake

0 likes · 12 min read

DataFunSummit

Sep 9, 2024 · Big Data

Exploring Real-Time Lakehouse Architecture with Apache Paimon

This article presents Xiaomi's real-time lakehouse architecture, outlines its current challenges, introduces Apache Paimon and several use‑case scenarios—including stream join optimization, streaming upserts, and lookup joins—while discussing expected benefits and future directions for a more efficient, unified data platform.

Apache PaimonFlinkIceberg

0 likes · 12 min read

Exploring Real-Time Lakehouse Architecture with Apache Paimon

Data Thinking Notes

Aug 15, 2024 · Big Data

How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices

This article outlines practical approaches to data warehouse construction, covering dimensional modeling, layered architecture, capability development, real‑time and batch processing with technologies like Hive, Spark, Flink, Iceberg, and discusses governance, security, and future trends toward data value and real‑time metrics.

Data GovernanceData WarehouseIceberg

0 likes · 13 min read

How to Build a Scalable Data Warehouse: Theory, Architecture, and Best Practices

DataFunTalk

Aug 10, 2024 · Big Data

Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

This article presents a comprehensive overview of Xiaomi's sales data warehouse, detailing its development history, dimensional modeling theory, multi‑layer architecture, Lambda design with batch and streaming processing, capability layers, security measures, and answers to common technical questions.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Construction Practices, Architecture, and Capability Evolution

DataFunSummit

May 15, 2024 · Big Data

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article details Xiaomi's sales data warehouse development, covering its history, architecture, dimensional modeling, layer design, streaming‑batch integration, governance, security, and future directions, while also addressing practical Q&A on implementation challenges and best practices.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

iQIYI Technical Product Team

Apr 26, 2024 · Big Data

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

iQIYI replaced its costly Lambda architecture with a unified Iceberg‑based lakehouse that combines Flink streaming and batch processing, cutting data latency from hours to minutes, supporting thousands of tables via a multi‑table sink, guaranteeing completeness, and saving millions of RMB in operational costs.

Data LakeFlinkIceberg

0 likes · 18 min read

iQIYI Real-time Lakehouse: Stream‑Batch Unified Architecture

DataFunSummit

Apr 22, 2024 · Big Data

Intelligent Optimization of Bilibili’s Iceberg‑Based Lakehouse for Query Acceleration

This article describes Bilibili’s intelligent optimization project that automatically analyzes historical query workloads to configure multi‑dimensional sorting, various indexes, and pre‑aggregation on Iceberg tables, thereby reducing scan volume by 28% across dozens of tables and improving OLAP query latency.

Big DataData WarehouseIceberg

0 likes · 15 min read

Intelligent Optimization of Bilibili’s Iceberg‑Based Lakehouse for Query Acceleration

DataFunSummit

Mar 25, 2024 · Big Data

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

This article shares Kangaroo Cloud's exploration and practice of a real-time data lake, covering background, data lake concepts, challenges, solution architecture using the Shuzhan platform with Iceberg/Hudi, CDC ingestion, small file handling, cross-cluster ingestion, materialized view acceleration, and future development plans.

CDCCross-Cluster IngestionHudi

0 likes · 12 min read

Exploring Real-Time Data Lake Practices at Kangaroo Cloud

DataFunSummit

Mar 17, 2024 · Big Data

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

This article presents OPPO's smart data lakehouse solution, describing its massive EB‑scale architecture, the integration of batch and streaming engines, the Glacier service for table management, schema‑adaptive ingestion, performance optimizations, and future technical road‑maps for unified data processing.

Big DataData LakehouseFlink

0 likes · 15 min read

OPPO Smart Data Lakehouse: Architecture, Real‑time Lakehouse, and Technical Practices

DataFunSummit

Mar 14, 2024 · Big Data

Tencent Game Data Analysis: Lakehouse Integration Practice

This article presents Tencent Game's comprehensive lakehouse integration practice, detailing the project background, storage‑compute separation, data layering, unified DDL/DML operations, performance optimizations, and future plans, illustrating how StarRocks, Iceberg, and Spark are combined to achieve scalable, cost‑effective analytics for massive game data.

Compute-Storage SeparationData WarehouseIceberg

0 likes · 16 min read

Tencent Game Data Analysis: Lakehouse Integration Practice

iQIYI Technical Product Team

Mar 8, 2024 · Big Data

Smooth Migration from Hive to Iceberg Data Lake at iQIYI: Architecture, Techniques, and Performance Evaluation

iQIYI migrated hundreds of petabytes of Hive tables to Apache Iceberg using dual‑write, in‑place, and CTAS strategies, combined with partition pruning, Bloom filters, and Trino/Alluxio optimizations, achieving up to 40% lower query latency, simplified pipelines, and faster, cost‑effective data lake operations.

Data LakeHiveIceberg

0 likes · 20 min read

Smooth Migration from Hive to Iceberg Data Lake at iQIYI: Architecture, Techniques, and Performance Evaluation

DataFunSummit

Feb 29, 2024 · Big Data

Trino at Xiaomi: Architecture, Practices, and Future Plans

This article details Xiaomi’s practical deployment of Trino, covering its architectural role, core and extended capabilities, performance comparisons, integration with Iceberg and Spark, operational enhancements, multi‑cluster and ad‑hoc query scenarios, future cloud‑storage plans, and a Q&A session.

Big DataIcebergOLAP

0 likes · 20 min read

Trino at Xiaomi: Architecture, Practices, and Future Plans

DataFunSummit

Jan 21, 2024 · Big Data

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Layers

This article presents Xiaomi's sales data warehouse practice, detailing its evolution, positioning, dimensional modeling, layered architecture, Lambda design, Iceberg integration, capability building, security governance, and future directions toward data value and real‑time metrics.

Big DataData WarehouseFlink

0 likes · 15 min read

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Layers

DataFunTalk

Nov 13, 2023 · Big Data

Xiaomi Sales Data Warehouse: Architecture, Construction Theory, and Capability Evolution

This article introduces Xiaomi's sales data warehouse practices, covering its development history, positioning, architecture, dimensional modeling, layer theory, capability building, real‑time and batch processing using Lambda architecture, Iceberg, Flink, and Hologres, and discusses future trends and Q&A.

HologresIcebergsales analytics

0 likes · 15 min read

DataFunSummit

Oct 16, 2023 · Big Data

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

This article details Bilibili's implementation of an Iceberg‑based lakehouse platform that unifies storage and analytics, addressing Hive’s performance and latency issues through multidimensional sorting, various file‑level indexes, cube pre‑aggregation, star‑tree structures, and an automated Magnus service for intelligent optimization, achieving near‑second query responses.

Big DataIcebergLakehouse

0 likes · 14 min read

Bilibili's Iceberg‑Based Lakehouse Platform: Technical Practices for Sub‑Second Query Response

DataFunSummit

Oct 1, 2023 · Big Data

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

This presentation introduces Iceberg's core capabilities, details Xiaomi's practical applications—including log ingestion, near‑real‑time warehousing, offline challenges, column‑level encryption, and Hive migration—and outlines future development directions such as materialized views and cloud migration, providing a comprehensive view of modern data‑lake engineering.

Big DataData LakeFlink

0 likes · 22 min read

Iceberg Data Lake: Core Features, Xiaomi Use Cases, and Future Plans

DataFunSummit

Sep 25, 2023 · Big Data

Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices

This article presents Bilibili's practical implementation of Trino within a lakehouse architecture, focusing on the compute engine placement, stability enhancements, and containerized deployment, while detailing indexing strategies, pre‑computation techniques, Iceberg metadata optimizations, and performance gains for large‑scale analytical queries.

IcebergIndexingLakehouse

0 likes · 14 min read

Trino in Bilibili Lakehouse: Compute Engine, Stability, and Containerization Practices

dbaplus Community

Sep 3, 2023 · Big Data

How NetEase Yanxuan Migrated from Lambda to Iceberg for Seamless Batch‑Stream Integration

This article explains how NetEase Yanxuan upgraded its legacy Lambda architecture to an Iceberg‑based batch‑stream unified platform, detailing the original data pipeline, the challenges faced, the evaluation of Iceberg versus Hudi and DeltaLake, and the concrete engineering optimizations and governance measures implemented to achieve lower latency and higher query performance.

Batch-Stream IntegrationBig DataFlink

0 likes · 14 min read

How NetEase Yanxuan Migrated from Lambda to Iceberg for Seamless Batch‑Stream Integration

iQIYI Technical Product Team

Aug 25, 2023 · Big Data

Venus Log Platform Architecture Evolution: From ELK to Data Lake

The Venus log platform at iQiyi migrated from an ElasticSearch‑Kibana architecture to an Iceberg‑based data lake with Trino, cutting storage and compute costs by over 70%, boosting stability by 85%, and efficiently supporting billions of daily logs through write‑heavy, low‑query workloads.

Big DataElasticsearchIceberg

0 likes · 22 min read

Venus Log Platform Architecture Evolution: From ELK to Data Lake

Tencent Cloud Developer

Aug 23, 2023 · Big Data

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

The WeChat Experiment Platform migrated its 60,000 metric, 200,000 core, 30 PB plus data pipeline to an Iceberg based lakehouse, leveraging three layer metadata, fine grained partitioning, MERGE into writes, time travel snapshots and skew handling UDFs, which cut core time by 69%, saved ~100 PB storage, and reduced latency by up to 70%.

Big DataData WarehouseIceberg

0 likes · 18 min read

WeChat Experiment Platform: Architecture Design and Iceberg Lakehouse Optimization

Big Data Technology & Architecture

Aug 21, 2023 · Big Data

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

This note outlines how Hudi, Iceberg, and Paimon provide unified batch‑stream storage, UPSERT support, time‑travel capabilities, and lower development costs, enabling a streaming‑warehouse architecture that offers near‑real‑time latency, consistent semantics, persisted intermediate results, and easier historical data repair.

Batch ProcessingHudiIceberg

0 likes · 5 min read

Key Features and Benefits of Lakehouse Frameworks Hudi, Iceberg, and Paimon

AsiaInfo Technology: New Tech Exploration

Aug 18, 2023 · Big Data

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

This article analyzes the rise of lake‑house architecture in the Hadoop ecosystem, compares the technical capabilities of Hudi, Iceberg and Delta Lake, details implementation enhancements such as MOR and multi‑writer support, showcases Flink integration, presents a real‑time marketing use case, and outlines future development directions.

Big DataData GovernanceDelta Lake

0 likes · 14 min read

How Lakehouse Architecture is Transforming Hadoop: A Deep Dive into Hudi, Iceberg, and Delta Lake

DataFunSummit

Aug 7, 2023 · Big Data

Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements

This article presents a comprehensive overview of Impala's high‑performance MPP query engine, its architecture for data‑lake workloads, and detailed performance optimizations including Iceberg table format improvements, manifest caching, and various Codegen techniques such as asynchronous compilation and caching.

Big DataCodegenData Lake

0 likes · 17 min read

Performance Optimizations in Impala for Data Lake Queries: Iceberg and Codegen Enhancements

dbaplus Community

Jul 17, 2023 · Big Data

How Bilibili Built Billions 3.0: A Low‑Cost, Scalable Log Platform with ClickHouse, Iceberg, and Trino

This article details Bilibili's evolution from the ClickHouse‑based Billions 2.0 log system to the Billions 3.0 architecture, explaining how they reduced storage costs, improved troubleshooting, adopted a lake‑house design with Iceberg on HDFS, leveraged ClickHouse for acceleration, and integrated Trino as the unified query engine.

ClickHouseIcebergObservability

0 likes · 37 min read

How Bilibili Built Billions 3.0: A Low‑Cost, Scalable Log Platform with ClickHouse, Iceberg, and Trino

iQIYI Technical Product Team

Jun 30, 2023 · Big Data

Advertising Data Lake Architecture and Real-time Optimizations

By replacing the costly Lambda architecture with a unified data‑lake built on Iceberg and Flink CDC, the advertising team achieved minute‑level latency, strong consistency, and lower storage expenses, cutting end‑to‑end processing times from hours to a few minutes across budgeting, warehousing, OLAP and ETL workloads.

AdvertisingBig DataFlink

0 likes · 13 min read

Advertising Data Lake Architecture and Real-time Optimizations

Bilibili Tech

Jun 20, 2023 · Big Data

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

Bilibili evolved its log platform from ClickHouse‑based Billions 2.0 to Billions 3.0 lakehouse using Iceberg, HDFS, Trino, retaining ClickHouse for acceleration; this reduces storage cost by over 20%, improves observability, solves the compute‑storage mismatch, adds flexible indexing, and supports complex ETL while staying open‑source.

ClickHouseIcebergIndexing

0 likes · 36 min read

Design and Evolution of Bilibili's Billions 3.0 Log Platform: A Lakehouse Architecture with ClickHouse, Iceberg, and Trino

DataFunSummit

Jun 13, 2023 · Big Data

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

This article details Bilibili's implementation of a sub‑second response lakehouse platform using Apache Iceberg, covering background challenges, query acceleration techniques such as multi‑dimensional sorting, indexing, cube pre‑aggregation, and intelligent automated optimizations via the Magnus service, and reports current production metrics.

CubeIcebergLakehouse

0 likes · 14 min read

Building a Sub‑Second Response Lakehouse Platform with Apache Iceberg at Bilibili

Big Data Technology & Architecture

Jun 13, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of Iceberg for its data lake, covering the OLAP architecture, reasons for a data lake, Iceberg's table format advantages over Hive, platform construction, streaming ingestion, query and performance optimizations, real‑world business deployments, and future plans.

Big DataData LakeFlink

0 likes · 21 min read

Iceberg Data Lake Implementation and Optimization at iQIYI

DataFunSummit

Jun 10, 2023 · Big Data

Performance Optimization of Iceberg Real‑time Data Warehouse and Arctic Enhancements

This article presents a comprehensive overview of Iceberg MOR principles, Arctic‑based performance optimizations, benchmark evaluations using CH‑benchmark, and future roadmap items, highlighting how various file‑type strategies, self‑optimizing mechanisms, and task balancing improve real‑time data lake query efficiency.

ArcticData LakeIceberg

0 likes · 14 min read

Performance Optimization of Iceberg Real‑time Data Warehouse and Arctic Enhancements

DataFunTalk

Jun 2, 2023 · Big Data

Iceberg Data Lake Implementation and Optimization at iQIYI

This article details iQIYI's adoption of the Iceberg data lake, covering its OLAP architecture, reasons for a lake, Iceberg table format advantages over Hive, platform construction, extensive performance optimizations, and real‑world business use cases such as ad‑flow unification, log analysis, audit, and CDC pipelines.

Big DataData LakeFlink

0 likes · 18 min read

DataFunTalk

May 23, 2023 · Big Data

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

This article details Bilibili's technical practice of constructing a millisecond‑response lake‑warehouse platform using Apache Iceberg, covering the background challenges, unified architecture, multi‑dimensional sorting and indexing for query acceleration, the Magnus service for intelligent optimization, and the current production deployment and performance metrics.

Big DataCubeIceberg

0 likes · 14 min read

Building a Millisecond‑Response Lakehouse Platform with Apache Iceberg: Architecture, Query Acceleration, and Intelligent Optimization

DataFunSummit

Apr 28, 2023 · Big Data

Building a Unified Streaming‑Batch Storage Architecture at Xiaohongshu

This article presents Xiaohongshu's design and implementation of a unified streaming‑batch storage system that integrates Lambda architecture, Kafka, Flink, Iceberg, and modern OLAP engines to solve real‑time data warehouse pain points and enable consistent, exactly‑once analytics across streaming and batch workloads.

Batch ProcessingFlinkIceberg

0 likes · 16 min read

Building a Unified Streaming‑Batch Storage Architecture at Xiaohongshu

DataFunTalk

Apr 10, 2023 · Big Data

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase data‑lake technology manager Ma Jin explains the distinction between data lakes and lakehouses, reviews the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, evaluates feature maturity and performance trade‑offs, and discusses systematic versus non‑systematic adoption in enterprises.

Big DataData LakehouseDelta Lake

0 likes · 13 min read

Interview on Data Lakehouse: Current Applications, Challenges, and Evolution

DataFunSummit

Mar 10, 2023 · Big Data

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

This interview with NetEase’s data‑lake technology manager explores the distinction between data lakes and lakehouses, the evolution of table‑format technologies such as Iceberg, Hudi and Delta Lake, their maturity across key capabilities, and the practical adoption challenges faced by enterprises.

Data LakeDelta LakeHudi

0 likes · 14 min read

Interview on Data Lake and Lakehouse: Current Applications, Challenges, and Evolution

DataFunSummit

Feb 28, 2023 · Big Data

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

This article introduces the Iceberg table format, explains its core architecture and advantages such as transactionality, implicit partitioning and row‑level updates, details Xiaomi's practical deployments—including CDC pipelines, partition strategies, compaction services, and stream‑batch integration—and outlines future development directions.

CompactionData LakeFlink

0 likes · 20 min read

Iceberg Technology Overview and Its Application at Xiaomi: Practices, Stream‑Batch Integration, and Future Plans

NetEase Yanxuan Technology Product Team

Feb 27, 2023 · Big Data

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

This article details how NetEase Yanxuan transformed its data platform from a dual Lambda architecture to a unified batch‑stream solution built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and Delta Lake, implementation of stream‑batch pipelines, message ordering fixes, snapshot generation, and extensive table‑governance optimizations.

Apache FlinkApache SparkBatch-Stream Integration

0 likes · 14 min read

How NetEase Yanxuan Migrated from Lambda to Iceberg for Real‑Time Batch‑Stream Integration

DataFunTalk

Feb 24, 2023 · Big Data

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

This article explains how Presto and Alluxio work together to query Iceberg tables, describes their architectures, deployment options, best‑practice recommendations such as using Iceberg native catalogs and local caches, and outlines future research directions for improving CPU usage and off‑heap caching.

AlluxioBig DataCache

0 likes · 14 min read

Presto and Alluxio Integration for Iceberg: Architecture, Best Practices, and Future Work

DataFunTalk

Feb 20, 2023 · Big Data

Understanding Data Lakes and Their Application at iQIYI: Concepts, Scenarios, and Iceberg Implementation

This article explains the definition of data lakes (public‑cloud and non‑public‑cloud), outlines their key characteristics, presents three typical business scenarios—real‑time event analysis, change‑data analysis, and stream‑batch integration—summarizes required product features, evaluates open‑source lake formats, and details iQIYI's adoption of Apache Iceberg across multiple services to achieve low‑latency, large‑scale, cost‑effective analytics.

Big DataData LakeIceberg

0 likes · 23 min read

Understanding Data Lakes and Their Application at iQIYI: Concepts, Scenarios, and Iceberg Implementation

dbaplus Community

Feb 15, 2023 · Big Data

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

This article details Bilibili's 北极星 user behavior analysis platform, tracing its evolution from early Spark‑Jar models to Flink‑ClickHouse pipelines and Iceberg‑based full aggregation, and explains the technical solutions for event, retention, funnel, path analysis, data ingestion, cluster rebalancing, and performance optimizations that enable massive real‑time analytics on billions of daily events.

ClickHouseData EngineeringFlink

0 likes · 32 min read

How Bilibili Scaled User Behavior Analytics with ClickHouse, Flink, and Iceberg

ITPUB

Jan 26, 2023 · Big Data

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

This article explains the challenges of a Lambda‑architecture data pipeline, introduces NetEase’s Arctic lakehouse built on Apache Iceberg, details its table‑store design, optimization cycles, consistency mechanisms, real‑time features, practical use cases, and future roadmap, highlighting its advantages over similar solutions.

ArcticData IntegrationFlink

0 likes · 14 min read

How NetEase’s Arctic Unifies Streaming and Batch with Iceberg for Real‑Time Lakehouse

DataFunSummit

Jan 10, 2023 · Big Data

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

This article presents a comprehensive overview of Iceberg's adoption in Huawei Terminal Cloud, covering its architectural overview, key features such as Git‑style data management, real‑time processing, acceleration layers, and future development directions, along with a Q&A session addressing performance and implementation details.

Big DataData LakeFlink

0 likes · 15 min read

Exploring Iceberg in Huawei Terminal Cloud: Architecture, Features, and Future Plans

Bilibili Tech

Jan 10, 2023 · Big Data

Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform

Bilibili’s PolarStar platform evolved from Spark‑based batch jobs to a Flink‑driven real‑time pipeline and finally to a unified Iceberg‑on‑ClickHouse model, cutting query latency to seconds, saving thousands of CPU cores and hundreds of gigabytes of Redis memory while enabling complex, near‑real‑time user‑behavior analyses and scalable data‑import, rebalancing, and compression optimizations.

ClickHouseFlinkIceberg

0 likes · 30 min read

Technical Evolution of Bilibili's PolarStar User Behavior Analysis Platform

DataFunTalk

Dec 31, 2022 · Big Data

Glacier: An Intelligent Data Lake Architecture for Real‑Time Analytics and Machine Learning

This article presents Glacier, OPPO's intelligent data lake solution that builds on Iceberg Table Format to provide real‑time data ingestion, low‑latency queries, advanced indexing, and robust multi‑version management for both structured and unstructured data, tightly integrating with machine‑learning workflows.

Data LakeGlacierIceberg

0 likes · 20 min read

Glacier: An Intelligent Data Lake Architecture for Real‑Time Analytics and Machine Learning

Tencent Advertising Technology

Dec 27, 2022 · Big Data

Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink

The article details how Tencent Advertising re‑architected its massive log pipeline by consolidating heterogeneous real‑time and offline logs into an Iceberg‑based data lake, introducing multi‑level partitioning, Spark and Flink ingestion, and numerous performance and cost optimizations for scalable big‑data analytics.

Big DataData LakeFlink

0 likes · 20 min read

Design and Optimization of Tencent Advertising Log Data Lake Using Iceberg, Spark, and Flink

StarRocks

Dec 1, 2022 · Big Data

How Alibaba Cloud EMR StarRocks Supercharges Data Lake Analytics with Advanced Optimizations

This article explains how Alibaba Cloud EMR StarRocks extends data lake analytics to support Hive, Iceberg, and Hudi, detailing its architecture, Iceberg integration, performance gains over Trino, IO merging, lazy materialization, intelligent caching, and elastic compute capabilities for faster, unified, and cost‑effective queries.

Data LakeEMRElastic Compute

0 likes · 16 min read

How Alibaba Cloud EMR StarRocks Supercharges Data Lake Analytics with Advanced Optimizations

ITPUB

Nov 18, 2022 · Big Data

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.

Batch ProcessingBig DataData Lake

0 likes · 23 min read

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

DataFunTalk

Nov 13, 2022 · Big Data

Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration

This article presents an overview of the Iceberg table format, its core architecture and advantages, details Xiaomi’s large‑scale deployment and use cases, explores stream‑batch integration with Spark and Flink, outlines data correction methods, future plans, and answers common technical questions.

Data LakeFlinkIceberg

0 likes · 20 min read

Iceberg Data Lake: Technology Overview, Xiaomi Practices, and Stream‑Batch Integration

StarRocks

Nov 4, 2022 · Big Data

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

This article explains how to design and implement a cloud‑native Lakehouse using StarRocks and Tencent Cloud EMR, covering core technical requirements, a five‑layer architecture, data ingestion with Iceberg/Hudi, performance tricks like Z‑order clustering, cost‑control through elastic scaling, and the key product features of EMR StarRocks.

Big DataCloud ComputingEMR

0 likes · 24 min read

Building a High‑Performance, Cost‑Effective Cloud Lakehouse with StarRocks and EMR

Bilibili Tech

Sep 30, 2022 · Big Data

Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg

Bilibili’s new lake‑house platform, built on Trino and Iceberg, replaces Hive‑based pipelines by ingesting logs and DB data into Iceberg tables, applying advanced sorting, Z‑order/Hilbert clustering, bitmap and bloom indexes, virtual join columns and pre‑aggregation, enabling 70 000 daily queries on 2 PB with average scans of 2 GB and sub‑2‑second response times.

Big DataData SkippingIceberg

0 likes · 15 min read

Bilibili's Efficient Lakehouse Platform Built on Trino and Iceberg

Big Data Technology & Architecture

Sep 19, 2022 · Big Data

Apache Iceberg Table and Catalog Configuration Guide for Hadoop

This article outlines the configuration settings for Apache Iceberg tables and catalogs on Hadoop, covering read and write properties, combine behavior for small HDFS files, reserved table properties, catalog lock options, and Hive Metastore connector Hadoop settings, supplemented with illustrative screenshots.

Big DataCatalogHadoop

0 likes · 3 min read

Apache Iceberg Table and Catalog Configuration Guide for Hadoop

DataFunTalk

Aug 29, 2022 · Big Data

Migrating from Lambda Architecture to an Iceberg‑Based Unified Batch‑Stream Architecture at NetEase Yanxuan

This article details how NetEase Yanxuan upgraded its legacy Lambda data pipeline to a unified batch‑stream architecture built on Apache Iceberg, covering the original challenges, the evaluation of Iceberg versus Hudi and DeltaLake, implementation specifics, table‑governance techniques, and future roadmap.

Batch-StreamData LakeFlink

0 likes · 14 min read

Migrating from Lambda Architecture to an Iceberg‑Based Unified Batch‑Stream Architecture at NetEase Yanxuan

Past Memory Big Data

Aug 11, 2022 · Big Data

What Kind of Data Lake Do Enterprises Really Need? Lessons from Delta 2.0

The article examines the open‑source release of Delta 2.0, compares its features and benchmark results with Iceberg and Hudi, discusses the core capabilities required by enterprises for a lakehouse architecture, and introduces the Arctic project as a multi‑engine streaming lake service.

ArcticData LakeDelta Lake

0 likes · 25 min read

What Kind of Data Lake Do Enterprises Really Need? Lessons from Delta 2.0

DataFunTalk

Aug 10, 2022 · Big Data

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

The article reviews recent developments in data‑lake table formats—Delta Lake 2.0, Iceberg, and Hudi—examining their features, benchmark results, and ecosystem impact, and then introduces Arctic, an open‑source streaming lakehouse service built on Iceberg that aims to bridge batch‑stream gaps for enterprises.

Data LakeDelta LakeHudi

0 likes · 24 min read

Delta Lake 2.0, Iceberg, Hudi: A Comparative Study and the Arctic Lakehouse Service

DataFunTalk

Aug 1, 2022 · Big Data

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

This article details Bilibili's lakehouse implementation using Apache Iceberg and Alluxio, covering background challenges, architectural components, data organization techniques like Z‑order and bitmap indexes, performance benchmarks, and future optimization plans for large‑scale analytics.

AlluxioBitmap IndexIceberg

0 likes · 21 min read

Bilibili Lakehouse Integration: Iceberg and Alluxio Optimization Practices

Big Data Technology & Architecture

Jul 27, 2022 · Big Data

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

This article provides a comprehensive tutorial on setting up Flink 1.11 with Iceberg 0.11.1, creating Hive catalogs, building databases and tables, inserting data, and exploring Iceberg components, file structures, partitioned tables, execution plans, and programmatic access via Scala.

Big DataData LakeFlink

0 likes · 10 min read

Step-by-Step Guide to Installing and Using Flink with Iceberg for Real-Time Data Lake

JavaEdge

Jul 25, 2022 · Big Data

Choosing Between Lambda and Kappa: Real‑Time Data Warehouse Strategies

The article uses an acorn‑moving analogy to highlight latency and traceability challenges in enterprise data warehouses, then explains offline versus real‑time approaches, compares Lambda and Kappa architectures, discusses Iceberg integration, and shares a detailed e‑commerce real‑time warehouse case study with optimization tips.

Big DataFlinkIceberg

0 likes · 15 min read

Choosing Between Lambda and Kappa: Real‑Time Data Warehouse Strategies

DataFunTalk

Jul 15, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, MagnuS, Trino, and Alluxio are used to achieve flexible data storage, high‑performance query acceleration, and automated indexing through Z‑Order, Hilbert curve, Bloom filter, and advanced BitMap techniques.

Big DataData WarehouseIceberg

0 likes · 18 min read

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

Bilibili Tech

Jul 15, 2022 · Big Data

Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement

Bilibili’s lakehouse architecture merges Iceberg‑based data lake flexibility with data‑warehouse efficiency, using Kafka‑Flink real‑time ingestion, Spark offline loads, Trino queries, Alluxio caching, Z‑Order/Hilbert sorting, and enhanced BloomFilter and bitmap indexes to boost query speed up to tenfold while drastically cutting file reads.

Big Data ArchitectureBitmap IndexData Lake

0 likes · 17 min read

Lakehouse Architecture Practice at Bilibili: Query Acceleration and Index Enhancement

DataFunSummit

Jul 12, 2022 · Big Data

Practical Use of Apache Iceberg in Microvision's Data Warehouse: Architecture, Real‑time Integration, and Table Maintenance

This article details why Microvision adopted Apache Iceberg, how it replaces parts of their Lambda‑architecture data pipeline, the real‑time and offline use cases, table‑maintenance practices such as snapshot cleanup and small‑file merging, and lessons learned from the implementation.

Big DataData LakeFlink

0 likes · 17 min read

Practical Use of Apache Iceberg in Microvision's Data Warehouse: Architecture, Real‑time Integration, and Table Maintenance

Big Data Technology & Architecture

Jul 12, 2022 · Big Data

Analyzing Spark's Iceberg Data Reading Process and Small‑File Merging

This article explains how Spark reads data from Apache Iceberg tables by parsing snapshots and manifest files into DataFile objects, creates Batch and InputPartition objects, uses readers to materialize InternalRows, and then demonstrates how Iceberg's RewriteDataFilesAction can merge tiny Parquet files into larger ones through Spark‑driven tasks.

Big DataData LakeIceberg

0 likes · 17 min read

Analyzing Spark's Iceberg Data Reading Process and Small‑File Merging

DataFunTalk

Jun 28, 2022 · Big Data

JD Retail Traffic Data Warehouse Architecture and Processing Practices

This article presents a comprehensive technical overview of JD.com’s retail traffic data processing pipeline, detailing the multi‑layer data warehouse architecture, real‑time and offline data flows, a large‑scale back‑fill case using Iceberg and OLAP, data‑skew detection and mitigation techniques, and future directions involving unified Flink‑Spark streaming‑batch solutions.

Data SkewFlinkIceberg

0 likes · 12 min read

JD Retail Traffic Data Warehouse Architecture and Processing Practices

Volcano Engine Developer Services

Jun 20, 2022 · Big Data

How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study

ByteDance tackled massive feature‑storage challenges by replacing row‑based HDFS files with columnar Parquet and the Iceberg table format, enabling schema evolution, selective reads, efficient backfill, and training optimizations that cut storage costs by over 40% and reduced CPU and network I/O dramatically.

Big DataData LakeIceberg

0 likes · 13 min read

How ByteDance Scaled Feature Storage with Iceberg and Parquet: A Big Data Case Study

DataFunSummit

May 30, 2022 · Big Data

Lakehouse Architecture at Bilibili: Query Acceleration and Index Enhancement Practices

This article explains Bilibili's lake‑warehouse integrated architecture, describing how Iceberg, Z‑Order sorting, and advanced indexing techniques such as BloomFilter and BitMap are used to accelerate queries and improve data organization in large‑scale analytics workloads.

Big DataData LakeData Warehouse

0 likes · 18 min read

ITPUB

Apr 19, 2022 · Big Data

Which Real-Time Data Warehouse Architecture Fits Your Needs? A Deep Dive

This article explains why modern enterprises need real‑time data‑warehouse architectures, breaks down traditional layered warehouse concepts, compares Lambda and Kappa models, evaluates five practical real‑time solutions—including Iceberg‑based lakehouse and MPP databases—provides code snippets, and offers selection guidance with real‑world company examples.

Big DataFlinkIceberg

0 likes · 19 min read

Which Real-Time Data Warehouse Architecture Fits Your Needs? A Deep Dive

Big Data Technology & Architecture

Apr 15, 2022 · Big Data

Configuring Flink SQL Client with Iceberg: Catalogs, DDL, Data Insertion and Query

This guide explains how to set up the Flink SQL client to work with Apache Iceberg, covering Scala version requirements, downloading and deploying Iceberg jars, configuring Hive and HDFS catalogs, creating databases and tables, performing insert and overwrite operations, and querying data in both batch and streaming modes.

Big DataCatalogFlink

0 likes · 18 min read

Configuring Flink SQL Client with Iceberg: Catalogs, DDL, Data Insertion and Query