Tagged articles
61 articles
Page 1 of 1
StarRocks
StarRocks
Mar 5, 2026 · Big Data

How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse

Fanatics unified its fragmented data stack by building a StarRocks‑powered Lakehouse on Apache Iceberg, replacing Redshift, Snowflake, Athena, and Druid, which cut costs by up to 95%, delivered sub‑second dashboard queries on petabyte‑scale data, and enabled real‑time and historical analytics on a single platform.

Apache IcebergData ArchitectureFanatics
0 likes · 10 min read
How Fanatics Scaled to PB‑Level Data with StarRocks & Apache Iceberg Lakehouse
DevOps Coach
DevOps Coach
Jan 25, 2026 · Operations

Why Infra Companies Are Racing Into Observability and What It Means for 2026

The article examines how SRE and infrastructure teams are converging, why major infra vendors are acquiring observability assets, the rising cost pressures, and how OpenTelemetry combined with Apache Iceberg forms a new standard stack that AI‑driven incident response will rely on in the coming years.

AI incident responseApache IcebergSRE
0 likes · 11 min read
Why Infra Companies Are Racing Into Observability and What It Means for 2026
DataFunSummit
DataFunSummit
Dec 1, 2025 · Big Data

7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes

This article collection showcases seven advanced data engineering solutions—from Tencent Cloud's Iceberg batch‑stream integration and Apache Gravitino metadata lineage to Xiaohongshu's Lakehouse evolution and multimodal AI data lake implementations—highlighting architectural innovations, performance optimizations, and real‑world deployment insights for modern big‑data platforms.

Apache GravitinoApache IcebergBatch-Stream Integration
0 likes · 7 min read
7 Cutting-Edge Data Engineering Practices Shaping AI-Driven Data Lakes
Baidu Geek Talk
Baidu Geek Talk
Jun 30, 2025 · Big Data

How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance

This article explains how Baidu’s next‑generation data platform Turing 3.0 integrates Apache Iceberg to solve the inefficiencies of the legacy MEG stack, detailing ecosystem components, migration strategies from Hive, table‑level optimizations, and future roadmap for high‑frequency, low‑latency analytics.

Apache IcebergData LakeHive Migration
0 likes · 17 min read
How Baidu’s Turing 3.0 Leverages Apache Iceberg to Boost Data Lake Performance
DataFunSummit
DataFunSummit
Jun 3, 2025 · Big Data

BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing

BiFang is a lake‑stream integrated storage engine that merges Apache Pulsar message‑queue capabilities with Iceberg data‑lake features, providing a single unified data store with full‑incremental queries, sub‑second visibility, exactly‑once semantics, and seamless integration with Flink, Spark, and StarRocks for both real‑time analytics and batch processing.

Apache IcebergApache PulsarLakehouse
0 likes · 13 min read
BiFang: A Unified Lake‑Stream Storage Engine for Real‑Time and Batch Data Processing
Alibaba Cloud Infrastructure
Alibaba Cloud Infrastructure
Mar 6, 2025 · Big Data

Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization

This article examines how Apache Iceberg’s snapshot‑based ACID transactions, logical‑physical partition evolution, and COW/MOR update modes enable efficient real‑time data lake ingestion, and demonstrates AutoMQ’s Kafka‑to‑Iceberg Table Topic solution that simplifies schema management, reduces latency, and cuts operational costs.

Apache IcebergAutoMQBig Data
0 likes · 14 min read
Leveraging Apache Iceberg and AutoMQ for Real-Time Data Lake Ingestion: Architecture, Best Practices, and Cost Optimization
DataFunSummit
DataFunSummit
Nov 20, 2024 · Artificial Intelligence

How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats

In a panel discussion, experts explain how data‑lake‑warehouse integration, columnar formats like Apache Iceberg, and emerging variant types enable efficient feature engineering, support large‑language‑model workloads, and provide flexible vector storage, thereby driving the evolution of AI from traditional ML to the GenAI era.

Apache IcebergData LakeGenAI
0 likes · 6 min read
How Data Lakes Empower AI: Expert Insights on Feature Management, Columnar Storage, and Vector Formats
DataFunTalk
DataFunTalk
Nov 6, 2024 · Big Data

How Data Lakes Empower AI: Insights from Industry Experts

In a panel discussion, experts from Kuaishou, Ping An, and Datastrato explain how data lake architectures, columnar storage formats like Apache Iceberg, and vector‑enabled lake formats are enhancing feature management, supporting generative AI workloads, and accelerating machine‑learning pipelines.

AIApache IcebergBig Data
0 likes · 6 min read
How Data Lakes Empower AI: Insights from Industry Experts
StarRocks
StarRocks
Sep 5, 2024 · Big Data

Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg

This tutorial walks you through the fundamentals of Apache Iceberg, its architecture and key features, explains why it’s advantageous for lakehouse workloads, and provides a step‑by‑step Docker‑Compose setup to integrate Iceberg with StarRocks for fast, ACID‑compliant analytics on real‑world taxi data.

Apache IcebergDockerLakehouse
0 likes · 15 min read
Accelerate Lakehouse Queries: A Hands‑On Guide to StarRocks + Apache Iceberg
DataFunTalk
DataFunTalk
Sep 4, 2024 · Artificial Intelligence

Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg

This article explores the evolution of data lakes for AI, discusses the challenges of AI-era data management, introduces Apache Iceberg and its architecture, demonstrates PyIceberg-based AI training and inference pipelines, and presents vector table designs with LSH indexing and performance optimizations.

AIApache IcebergBig Data
0 likes · 22 min read
Data+AI Data Lake Technologies: Challenges, Apache Iceberg Overview, and Vector Table Implementations with PyIceberg
StarRocks
StarRocks
Jul 24, 2024 · Big Data

Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era

The article examines the rapid rise of lakehouse architecture, its market momentum, core components—including storage, metadata, table formats, and compute layers—compares Iceberg, Hudi, and Delta Lake, discusses the shift from HDFS to object storage, and outlines the strategic importance of lakehouses for AI-driven data management and future data infrastructure trends.

AIApache IcebergBig Data
0 likes · 28 min read
Why Lakehouse Architecture Is Redefining Big Data Infrastructure in the AI Era
DataFunSummit
DataFunSummit
Jun 20, 2024 · Big Data

Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions

This article presents a comprehensive overview of modern Data+AI data lake challenges and solutions, covering the evolution of data lakes, an introduction to Apache Iceberg, practical use of PyIceberg for AI training and inference pipelines, and advanced vector table and indexing techniques for efficient similarity search.

AI trainingApache IcebergBig Data
0 likes · 22 min read
Data+AI Data Lake Technologies: Apache Iceberg, PyIceberg, and Vector Table Solutions
StarRocks
StarRocks
May 22, 2024 · Big Data

Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration

Apache Iceberg offers a modern, ACID‑compliant table format for data lakes with features like hidden partitions and schema evolution, while StarRocks provides high‑performance query acceleration, metadata caching, and distributed planning to address Iceberg’s latency challenges, enabling seamless lake‑warehouse integration and real‑time analytics.

Apache IcebergData LakeMetadata Caching
0 likes · 19 min read
Unlocking Data Lake Power: Iceberg Architecture & StarRocks Acceleration
Xiaohongshu Tech REDtech
Xiaohongshu Tech REDtech
Mar 4, 2024 · Big Data

Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations

Xiaohongshu’s data‑warehouse team integrated Apache Iceberg‑based data‑lake techniques into its existing warehouse, replacing the legacy Hive/Spark stack with global sorting, Z‑order, and upsert‑enabled tables, which cut query latency by up to 90 %, boosted data freshness by 50 %, slashed storage costs by 83 % and saved tens of thousands of GB‑hours of compute daily.

Apache IcebergData LakeData Warehouse
0 likes · 19 min read
Integrating Data Lake Technologies with Data Warehouse Architecture at Xiaohongshu: Practices and Performance Optimizations
DataFunSummit
DataFunSummit
Dec 20, 2023 · Cloud Native

Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro

This article introduces the background, challenges, and cloud‑native solutions of lakehouse architecture, explains Apache Iceberg’s open table format and its cloud‑native features, details Amoro’s management and self‑optimizing capabilities, showcases three real‑world cloud migration cases, and outlines future development plans.

AmoroApache IcebergData Management
0 likes · 12 min read
Building a Cloud‑Native Lakehouse with Apache Iceberg and Amoro
DataFunTalk
DataFunTalk
Oct 5, 2023 · Big Data

Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg

This article describes how Shanghai Steel Union leveraged Amoro Mixed Iceberg on top of Apache Iceberg to create a unified streaming‑batch lakehouse, addressing small‑file and upsert challenges, simplifying architecture, improving data freshness, and providing a scalable solution for real‑time and batch analytics.

AmoroApache IcebergBig Data
0 likes · 13 min read
Building a Unified Streaming‑Batch Lakehouse with Amoro Mixed Iceberg
iQIYI Technical Product Team
iQIYI Technical Product Team
Sep 22, 2023 · Big Data

Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform

iQIYI’s data‑middle‑platform team built a four‑zone data lake—raw, product, work, and sensitive—integrated with unified ODS/DWD/MID layers, a metadata catalog, and self‑service tools, leveraging HDFS, Hive/Iceberg, Spark/Trino, and Flink, migrated to Apache Iceberg for real‑time freshness, and now aims to further streamline modules and adopt new technologies.

Apache IcebergData GovernanceData Lake
0 likes · 13 min read
Data Lake: Concepts, Architecture, and Application in iQIYI's Data Platform
ITPUB
ITPUB
Aug 23, 2023 · Cloud Native

Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro

This guide explains the cloud‑native lakehouse concept, outlines its advantages and challenges, compares lake‑table projects such as Iceberg, and provides a step‑by‑step AWS deployment of Apache Iceberg and Amoro—including environment setup, AMS installation, catalog configuration, optimizer launch, data ingestion with Flink, and query verification with Spark.

AWSAmoroApache Iceberg
0 likes · 33 min read
Build a Cloud‑Native Lakehouse on AWS with Apache Iceberg and Amoro
DataFunTalk
DataFunTalk
Jul 11, 2023 · Big Data

Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg

This article examines the architecture and core design of lakehouse storage systems, compares the metadata handling and Merge‑On‑Read mechanisms of Delta Lake, Apache Hudi, and Apache Iceberg, and presents practical performance‑optimization techniques and real‑world case studies on Alibaba Cloud EMR.

Apache HudiApache IcebergBig Data
0 likes · 18 min read
Analysis of Lakehouse Storage Systems: Design, Metadata, Merge‑On‑Read, and Performance Optimizations for Delta Lake, Apache Hudi, and Apache Iceberg
DataFunTalk
DataFunTalk
May 11, 2023 · Big Data

Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap

This article describes how ByteDance tackled petabyte‑scale feature storage by adopting Apache Iceberg, detailing the problem background, design choices, implementation of COW and MOR back‑fill strategies, performance optimizations, and future plans such as lake‑cold‑layering and materialized views.

Apache IcebergBig DataData Lake
0 likes · 16 min read
Scaling ByteDance Feature Store to EB‑Level with Apache Iceberg: Architecture, Practices, and Future Roadmap
iQIYI Technical Product Team
iQIYI Technical Product Team
Feb 3, 2023 · Big Data

Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI

iQIYI’s data lake combines public‑cloud and private storage with Apache Iceberg’s snapshot‑based table format to enable near‑real‑time, unified batch‑and‑stream analytics, reducing costs, simplifying architecture, and improving data freshness across use cases such as log collection, audit, pingback, and member order processing.

Apache IcebergData ArchitectureData Lake
0 likes · 25 min read
Data Lake Concepts, Benefits, and Iceberg‑Based Implementations at iQIYI
DataFunTalk
DataFunTalk
Dec 8, 2022 · Big Data

Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg

This article introduces NetEase’s Arctic, a real‑time lakehouse system built on Apache Iceberg that unifies streaming and batch processing, explains the challenges of Lambda architecture, details Arctic’s features such as change/base stores, hidden queue, transaction handling, and shares internal practice cases and future roadmap.

Apache IcebergArcticData Lake
0 likes · 12 min read
Arctic: NetEase’s Real-Time Lakehouse System Built on Apache Iceberg
NetEase Cloud Music Tech Team
NetEase Cloud Music Tech Team
Oct 26, 2022 · Big Data

Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice

Arctic, NetEase’s streaming lakehouse built on Apache Iceberg, unifies streaming and batch workloads with millisecond‑level latency, Hive compatibility, and built‑in message‑queue support, delivering CDC, upserts and OLAP without a Lambda architecture, as demonstrated by real‑time processing of 2 PB of Hive data for Cloud Music.

Apache IcebergArcticBig Data Architecture
0 likes · 15 min read
Arctic: NetEase's Streaming Lakehouse Service and Hive-Based Stream-Batch Integration Practice
DataFunSummit
DataFunSummit
Oct 21, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform architecture and three real‑time lake initiatives—log ingestion, CDC ingestion, and lake analysis—showcasing how Apache Iceberg, Flink, and custom shuffling algorithms solve small‑file and cross‑cloud challenges while enabling schema evolution and future multi‑cloud optimizations.

Apache IcebergBig DataCDC
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
Big Data Technology Architecture
Big Data Technology Architecture
Aug 23, 2022 · Big Data

Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures

This article examines the technical differences and feature sets of Apache Hudi, Delta Lake, and Apache Iceberg, highlighting incremental pipelines, concurrency control, merge‑on‑read storage, partition evolution, multi‑mode indexing, and real‑world use cases to help practitioners choose the most suitable lakehouse solution for their workloads.

Apache HudiApache IcebergConcurrency Control
0 likes · 18 min read
Comparative Analysis of Apache Hudi, Delta Lake, and Apache Iceberg for Lakehouse Architectures
DataFunTalk
DataFunTalk
Aug 6, 2022 · Big Data

Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg

This article details Xiaohongshu's data platform engineering, describing how Apache Iceberg is leveraged for real‑time data lake ingestion, CDC pipelines, multi‑cloud storage, small‑file mitigation, schema evolution, and future plans across storage, compute, and management within a big‑data ecosystem.

Apache IcebergCDCFlink
0 likes · 16 min read
Exploring Real‑Time Data Lake Practices at Xiaohongshu Using Apache Iceberg
DataFunSummit
DataFunSummit
Apr 29, 2022 · Big Data

Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization

This article explains how Apache Iceberg’s DataSkipping technique can lose efficiency when many filter columns are used, and presents a data‑organization optimization using space‑filling curves and Z‑Order to improve query I/O, details the OPTIMIZE implementation, and shares performance benchmark results and future plans.

Apache IcebergBig DataData Skipping
0 likes · 12 min read
Optimizing Query Performance in Apache Iceberg with Z‑Order Data Organization
DataFunTalk
DataFunTalk
Apr 9, 2022 · Big Data

Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization

This talk explains how Apache Iceberg’s DataSkipping can lose efficiency with many filter columns, and presents a data‑organization redesign using space‑filling curves and Z‑Order to improve query I/O, detailing the OPTIMIZE syntax, implementation steps, performance benchmarks, and future roadmap.

Apache IcebergBig DataData Skipping
0 likes · 12 min read
Optimizing Apache Iceberg Query Performance with Z‑Order Data Organization
DataFunTalk
DataFunTalk
Mar 1, 2022 · Cloud Native

Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions

The presentation outlines Alibaba Cloud's native data lake solution built on Apache Iceberg, covering data lake fundamentals, cloud migration challenges, Iceberg's architecture and features, real‑time ingestion with Flink, unified metadata management, security guarantees, and testing practices to ensure reliable, scalable big‑data analytics.

Apache IcebergBig DataData Lake
0 likes · 16 min read
Alibaba Cloud Native Data Lake with Apache Iceberg: Architecture, Challenges, and Solutions
DataFunTalk
DataFunTalk
Feb 25, 2022 · Big Data

Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization

This article explains how Tencent leverages Apache Iceberg together with Flink to build a real‑time data lake pipeline, covering data ingestion, Iceberg's snapshot‑based read/write model, compaction and governance services, Z‑order based query optimization, performance results, and future roadmap.

Apache IcebergBig DataData Lake
0 likes · 24 min read
Tencent's Application of Apache Iceberg for Real‑Time Data Lake Ingestion, Governance, and Query Optimization
DataFunTalk
DataFunTalk
Feb 12, 2022 · Big Data

NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap

This article introduces NetEase's internally incubated data lake project Arctic, explains the concept of data lakes, outlines NetEase's specific requirements for a unified streaming‑batch platform, details Arctic's core architecture, storage strategy, data‑merge mechanisms, current achievements, and future development plans.

Apache IcebergArcticBig Data
0 likes · 10 min read
NetEase Internal Data Lake Project Arctic: Architecture, Requirements, and Future Roadmap
DataFunTalk
DataFunTalk
Jan 8, 2022 · Big Data

Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices

This article provides a comprehensive overview of the Lakehouse paradigm, tracing its origins from traditional data warehouses and data lakes, comparing architectures, detailing core components such as Delta Lake and Iceberg, and illustrating practical cloud implementations and future directions.

Apache IcebergBig DataCloud Data Platform
0 likes · 14 min read
Lakehouse: Concepts, Architecture, Implementation, and Cloud Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Nov 8, 2021 · Big Data

Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices

This article examines the strengths and weaknesses of Apache Iceberg, explains why Tencent selected it over alternatives, details Tencent’s own enhancements and integration with Flink, Spark, and other engines, and shares multiple real‑world implementations for building enterprise‑grade real‑time data lakes.

Apache IcebergData LakeFlink
0 likes · 17 min read
Why Choose Apache Iceberg? Tencent’s Optimizations and Real‑World Practices
Big Data Technology Architecture
Big Data Technology Architecture
Aug 10, 2021 · Big Data

Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's practical experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points of traditional Lambda architectures, Iceberg's table format and capabilities, Flink‑Iceberg sink design, small‑file handling, and future roadmap for a unified streaming‑batch data lake.

Apache FlinkApache IcebergBatch Processing
0 likes · 20 min read
Building a Real‑Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
Big Data Technology & Architecture
Big Data Technology & Architecture
Jun 16, 2021 · Big Data

Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem

This article reviews the advantages of Apache Iceberg for data lake storage, details Tencent’s custom optimizations and integration with Flink and Spark, and shares multiple real‑world implementations that demonstrate how Iceberg improves data consistency, reduces small‑file overhead, and enables near‑real‑time analytics in large‑scale big‑data environments.

Apache IcebergData LakeFlink
0 likes · 18 min read
Practical Experience and Optimizations of Apache Iceberg in Tencent’s Big Data Ecosystem
Big Data Technology Architecture
Big Data Technology Architecture
Jun 10, 2021 · Big Data

Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music

This article explains Apache Iceberg’s table‑format design, compares it with Hive’s limitations, details its snapshot‑based architecture and metadata handling, and describes how NetEase Cloud Music leveraged Iceberg to dramatically improve large‑scale log processing performance and stability.

Apache IcebergSparkTable Format
0 likes · 12 min read
Understanding Apache Iceberg: Design, Architecture, and Its Application at NetEase Cloud Music
dbaplus Community
dbaplus Community
Jun 5, 2021 · Big Data

How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming

This article explains the concept of data lakes, outlines a four‑layer open‑source architecture, presents several classic Flink‑Iceberg use cases, details why Iceberg was chosen, and describes the design of Flink’s streaming sink and upcoming community roadmap.

Apache FlinkApache IcebergBig Data
0 likes · 14 min read
How Flink + Iceberg Transform Data Lakes for Real‑Time Streaming
DataFunTalk
DataFunTalk
Apr 18, 2021 · Big Data

Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage

This article compares Apache Hudi, Apache Iceberg, and Delta Lake, examining their storage formats, platform compatibility, update performance, concurrency guarantees, and integration with lakeFS to help readers choose the most suitable solution for their data lake use case.

Apache HudiApache IcebergDelta Lake
0 likes · 16 min read
Comparing Apache Hudi, Apache Iceberg, and Delta Lake for Data Lake Storage
Big Data Technology Architecture
Big Data Technology Architecture
Apr 5, 2021 · Big Data

Understanding Apache Iceberg: Table Format Architecture, Comparison with Hive Metastore, and Business Benefits

This article introduces Apache Iceberg as an open table format for massive analytic datasets, explains its underlying concepts such as schema, partitioning, statistics, and read/write APIs, compares it with Hive Metastore, outlines its ACID commit process, highlights the performance and operational advantages for big‑data workloads, and previews upcoming community features.

ACIDApache IcebergParquet
0 likes · 19 min read
Understanding Apache Iceberg: Table Format Architecture, Comparison with Hive Metastore, and Business Benefits
DataFunTalk
DataFunTalk
Feb 17, 2021 · Big Data

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

Apache IcebergBig DataCDC
0 likes · 13 min read
Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations
DataFunTalk
DataFunTalk
Feb 14, 2021 · Big Data

Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap

This talk presents NetEase's practical experience with Impala, covering its core architecture, new features in version 3.x, integration with Apache Iceberg, a custom management platform, profiling and statistics enhancements, as well as future plans involving Kubernetes, Alluxio caching and pre‑computation strategies.

Apache IcebergBig DataCluster Management
0 likes · 13 min read
Impala at NetEase: Architecture, Iceberg Integration, Management System, Optimizations and Future Roadmap
DataFunTalk
DataFunTalk
Feb 1, 2021 · Big Data

Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices

This article presents Tencent's experience of constructing a real‑time data warehouse by integrating Apache Flink with Apache Iceberg, covering background pain points, Iceberg's table format and capabilities, Flink‑Iceberg streaming and batch processing, practical implementations, and future roadmap for data‑lake acceleration.

Apache FlinkApache IcebergBig Data
0 likes · 21 min read
Building a Real-Time Data Warehouse with Apache Flink and Apache Iceberg: Architecture, Challenges, and Best Practices
Youzan Coder
Youzan Coder
Dec 21, 2020 · Big Data

Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth

At Youzan’s Big Data Technology Salon, over 100 attendees heard leaders from Youzan, NetEase Yishu, and Didi discuss cost governance, Apache Iceberg data lakes, large‑scale Flink real‑time computing, and data‑driven growth strategies, highlighting practical implementations, savings of millions and tools for merchant empowerment.

Apache IcebergData GrowthFlink
0 likes · 5 min read
Youzan Big Data Technology Salon – Cost Governance, Apache Iceberg, Flink, and Data‑Driven Growth
Youzan Coder
Youzan Coder
Dec 9, 2020 · Big Data

Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth

The Youzan Big Data Technology Salon brought together Youzan, NetEase and Didi to share practical approaches for cutting data‑infrastructure costs, building an Apache Iceberg‑based data lake, scaling Flink real‑time workloads, and creating a data‑driven growth platform that leverages tracking, A/B testing and analytics.

Apache IcebergBig DataData Cost Governance
0 likes · 5 min read
Youzan Big Data Technology Salon: Practices in Data Cost Governance, Apache Iceberg, Flink, and Data-Driven Growth
DataFunTalk
DataFunTalk
Dec 3, 2020 · Big Data

Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg

This article explains how Apache Flink integrates with data lake architectures, especially using Apache Iceberg as a table format, to enable real‑time streaming ingestion, CDC processing, near‑real‑time lambda architectures, and future enhancements like automatic file merging and row‑level deletes.

Apache IcebergData LakeFlink
0 likes · 13 min read
Streaming Data Lake Ingestion with Apache Flink and Apache Iceberg
Big Data Technology Architecture
Big Data Technology Architecture
Nov 27, 2020 · Big Data

Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap

This article explains how Apache Flink combines with Apache Iceberg to build unified stream‑batch data lake solutions, covering data lake fundamentals, architectural layers, classic business scenarios, reasons for choosing Iceberg, streaming ingestion design, and upcoming community enhancements.

Apache FlinkApache IcebergTable Format
0 likes · 13 min read
Integrating Apache Flink with Data Lakes Using Apache Iceberg: Architecture, Use Cases, and Future Roadmap
Big Data Technology Architecture
Big Data Technology Architecture
Mar 24, 2020 · Big Data

Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions

This article examines the three leading open‑source data‑lake projects—Delta Lake, Apache Iceberg, and Apache Hudi—by outlining their origins, core problems they address, key features, and a detailed seven‑dimension comparison to help practitioners choose the most suitable solution for their scenarios.

Apache HudiApache IcebergComparison
0 likes · 17 min read
Comparative Analysis of Delta Lake, Apache Iceberg, and Apache Hudi for Data Lake Solutions
dbaplus Community
dbaplus Community
Mar 17, 2020 · Big Data

Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi

An in‑depth comparison of the three leading open‑source data lake platforms—Delta Lake, Apache Iceberg, and Apache Hudi—examines their origins, core challenges they address, key features, and performance across seven evaluation dimensions to guide practitioners in selecting the optimal solution for their workloads.

Apache HudiApache IcebergData Lake
0 likes · 15 min read
Choosing the Right Open‑Source Data Lake: Delta vs Iceberg vs Hudi