Tag

Iceberg

0 views collected around this technical thread.

DataFunSummit
DataFunSummit
May 4, 2025 · Big Data

Iceberg Table Format Practice in Huawei Terminal Cloud

This article explains how Huawei's terminal cloud adopts the Apache Iceberg table format to efficiently manage large-scale datasets, detailing its architecture, feature engineering, merge operations, LSM-based storage, schema versioning, AB testing support, catalog enhancements, and future roadmap for full lifecycle data governance.

Big DataData LakeHuawei Cloud
0 likes · 13 min read
Iceberg Table Format Practice in Huawei Terminal Cloud
iQIYI Technical Product Team
iQIYI Technical Product Team
Mar 27, 2025 · Big Data

Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg

iQIYI transformed its real‑time data warehouse by replacing a costly Kafka‑based Lambda stack with a unified stream‑batch Iceberg lake, cutting storage expenses by 90%, halving compute costs, extending data retention, and delivering minute‑level freshness for 90% of use cases while preserving second‑level processing where needed.

IcebergKafkaSpark
0 likes · 11 min read
Cost‑Effective Real‑Time Data Warehouse 2.0: Migrating from Kafka to Iceberg
DataFunSummit
DataFunSummit
Jan 14, 2025 · Big Data

Tencent Real-Time Lakehouse Intelligent Optimization Practice

This presentation details Tencent's real‑time lakehouse architecture and the four key topics—lakehouse design, intelligent optimization services, scenario‑driven capabilities, and future outlook—covering components such as Spark, Flink, Iceberg, Auto‑Optimize Service, indexing, clustering, AutoEngine, and PyIceberg implementations.

Auto OptimizeBig DataData Optimization
0 likes · 12 min read
Tencent Real-Time Lakehouse Intelligent Optimization Practice
DataFunSummit
DataFunSummit
Jan 3, 2025 · Big Data

Tencent Real‑Time Lakehouse Intelligent Optimization Practices

This article presents Tencent's end‑to‑end real‑time lakehouse architecture, detailing its three‑layer design, the Auto Optimize Service modules such as compaction, indexing, clustering and engine acceleration, as well as scenario‑driven capabilities like multi‑stream joins, primary‑key tables, in‑place migration and PyIceberg support, and concludes with future optimization directions.

Big DataData OptimizationIceberg
0 likes · 11 min read
Tencent Real‑Time Lakehouse Intelligent Optimization Practices
DataFunSummit
DataFunSummit
Dec 27, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

This presentation describes Tencent's real-time lakehouse architecture, including data lake compute, management, and storage layers, and details the intelligent optimization services—such as compaction, indexing, clustering, and auto-engine—designed to improve query performance, storage cost, and operational efficiency for large-scale data processing.

AutoEngineCompactionData Optimization
0 likes · 11 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice
Bilibili Tech
Bilibili Tech
Dec 27, 2024 · Big Data

Consistency Architecture for Bilibili Recommendation Model Data Flow

The article outlines Bilibili’s revamped recommendation data‑flow architecture that eliminates timing and calculation inconsistencies by snapshotting online features, unifying feature computation in a single C++ library accessed via JNI, and orchestrating label‑join and sample extraction through near‑line Kafka/Flink pipelines, with further performance gains and Iceberg‑based future extensions.

Big DataIcebergProtobuf
0 likes · 12 min read
Consistency Architecture for Bilibili Recommendation Model Data Flow
Bilibili Tech
Bilibili Tech
Dec 17, 2024 · Big Data

Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili

Bilibili adopted Apache Gravitino as a unified metadata platform that decouples consumers, consolidates schemas and Fileset‑based unstructured data across heterogeneous sources, cuts metadata and storage costs, resolves inconsistencies, boosts Hive Metastore performance, and enables features such as Iceberg branching and future AI‑centric governance.

Apache GravitinoBig DataData Governance
0 likes · 20 min read
Apache Gravitino: Metadata Management Practices and Production Experience at Bilibili
Tencent Advertising Technology
Tencent Advertising Technology
Dec 6, 2024 · Big Data

Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent

Tencent's advertising team replaced a traditional HDFS‑Hive warehouse with an Apache Iceberg‑based data lake, adding primary‑key tables, multi‑stream merging, adaptive compaction, and Spark SPJ optimizations to achieve minute‑level feature update latency, 10× back‑fill speed, and up to 60% storage savings.

Big DataCDCCompaction
0 likes · 25 min read
Building a High‑Performance Advertising Feature Data Lake with Apache Iceberg at Tencent
Bilibili Tech
Bilibili Tech
Nov 26, 2024 · Big Data

Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices

Bilibili migrated its massive user‑behavior, commercial AI training, and database synchronization pipelines from Hive and Kafka to an Iceberg‑based streaming‑batch architecture, using Flink and the Magnus optimizer to achieve minute‑level freshness, reduce CPU and memory usage by about 20‑22 %, save roughly 3.55 M CNY annually, and dramatically improve query latency and join performance.

Data LakeIcebergStreaming
0 likes · 20 min read
Bilibili’s Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practices
DataFunSummit
DataFunSummit
Nov 23, 2024 · Big Data

Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice

This article presents Bilibili's end‑to‑end exploration of a streaming‑batch unified data pipeline built on Apache Iceberg, detailing the original and iterated architectures for massive user behavior transmission, online AI training, DB synchronization, and dimension‑join, along with performance gains, cost savings, and future plans.

Data LakeIcebergReal-time Analytics
0 likes · 20 min read
Bilibili's Iceberg‑Based Streaming‑Batch Integration: Architecture, Optimizations, and Practice
DataFunSummit
DataFunSummit
Nov 12, 2024 · Big Data

Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders

The article summarizes a roundtable discussion where experts compare four lake‑warehouse architectural patterns, explain their suitability for different business scenarios, contrast them with traditional data warehouses, and highlight practical considerations for choosing and evolving data platforms.

Big DataData LakeData Warehouse
0 likes · 6 min read
Data Lake and Data Warehouse Architectures: Expert Insights from Industry Leaders
Bilibili Tech
Bilibili Tech
Nov 12, 2024 · Big Data

Scalable Tag System Architecture and Optimization

The rebuilt tag system introduces a three‑layer architecture, standard pipelines, Iceberg‑backed storage and custom ClickHouse sharding, a DSL for crowd selection, and a stateless online service, achieving 99.9% success, sub‑5 ms latency, and supporting thousands of tags across dozens of business scenarios while planning real‑time processing and automated lifecycle management.

Big DataClickHouseData Pipeline
0 likes · 23 min read
Scalable Tag System Architecture and Optimization
DataFunSummit
DataFunSummit
Nov 8, 2024 · Big Data

Roundtable Discussion on Data Lake Technology Maturity and Governance Practices

Experts from Kuaishou, former Tencent, Ping An Insurance and others discuss data lake maturity, column‑level governance, resource management of unstructured data, and automated optimization techniques such as Iceberg small‑file merging, highlighting how these advances improve data quality and business decision‑making.

Big DataColumn-level GovernanceData Governance
0 likes · 6 min read
Roundtable Discussion on Data Lake Technology Maturity and Governance Practices
DataFunSummit
DataFunSummit
Nov 5, 2024 · Big Data

Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices

This article presents Tencent's real-time lakehouse architecture, detailing its three-layer design, the Auto Optimize Service with compaction, indexing, clustering and engine acceleration, scenario capabilities such as multi‑stream joins and in‑place migration, and outlines future optimization directions.

Big DataData LakeIceberg
0 likes · 11 min read
Tencent Real-Time Lakehouse Architecture and Intelligent Optimization Practices
Bilibili Tech
Bilibili Tech
Nov 1, 2024 · Big Data

Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform

Magnus is Bilibili’s self‑developed intelligent service that continuously optimizes Iceberg tables by scheduling snapshot expiration, orphan‑file cleanup, manifest rewriting, and multi‑dimensional data optimizations—including small‑file merging, sorting, distribution, and index creation—while automatically recommending configurations from real‑time query logs, delivering over 99.9% task success and up to 30% scan‑data reduction.

Data LakeIcebergIntelligent Recommendation
0 likes · 15 min read
Magnus: Intelligent Data Optimization Service for Iceberg Tables in Bilibili's Lakehouse Platform
Bilibili Tech
Bilibili Tech
Oct 25, 2024 · Big Data

DataFunSummit2024: Next-Generation Data Architecture Technology Summit

DataFunSummit2024, co-hosted by Bilibili, convenes industry experts, scholars, and enterprise leaders across six forums to discuss next‑generation data architecture, showcasing Bilibili’s Iceberg‑based stream‑batch innovations, AI‑BI analytics, NoETL practices, and emerging alternatives to Lambda architecture.

AI+BIBig DataData Architecture
0 likes · 3 min read
DataFunSummit2024: Next-Generation Data Architecture Technology Summit
DataFunTalk
DataFunTalk
Oct 3, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions

Amid growing data demands, this article explains the data lake technology maturity curve, detailing lake‑warehouse architectural patterns, design principles, core functionalities, and the four leading open‑source solutions (Hudi, Iceberg, Delta Lake, Paimon) to guide enterprises in building flexible, scalable, and governed data platforms.

Big DataData ArchitectureData Lake
0 likes · 10 min read
Data Lake Technology Maturity Curve: Architecture, Design Principles, Core Functions, and Open‑Source Solutions
DataFunTalk
DataFunTalk
Sep 24, 2024 · Big Data

Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications

This article explains the rapid growth of data-driven businesses, the challenges of traditional data warehouses, and how modern data lake technologies such as Delta Lake, Hudi, Iceberg, and Paimon form a maturity curve that guides enterprises in architecture choices, design principles, core capabilities, and practical applications.

Big DataData LakeDelta Lake
0 likes · 12 min read
Data Lake Technology Maturity Curve: Architecture Modes, Design Principles, Core Functions, and Applications
Sohu Tech Products
Sohu Tech Products
Sep 11, 2024 · Big Data

Tencent Real-time Lakehouse Intelligent Optimization Practice

Tencent’s real‑time lakehouse combines Spark, Flink, StarRocks and Presto compute layers with Iceberg‑based management and HDFS/COS storage, and its Intelligent Optimize Service—comprising Compaction, Expiration, Cleaning, Clustering, Index and Auto‑Engine modules—automatically reduces merge time, improves query performance, enables secondary indexing, and dynamically routes hot partitions, while future plans target cold/hot separation, materialized view acceleration, and AI‑driven optimizations.

Big DataClusteringCompaction
0 likes · 12 min read
Tencent Real-time Lakehouse Intelligent Optimization Practice