Big Data 23 min read

How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.

ITPUB
ITPUB
ITPUB
How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes

Iceberg Technology Overview

Iceberg is a table format designed for large analytical datasets that abstracts compute engines such as Spark, Trino, Flink, Hive, and Presto from the underlying storage layer. It supports file formats like Parquet, ORC, and Avro, and can store data on S3, OSS, or HDFS. The architecture separates compute and storage, allowing flexible engine choice and hiding file‑format details from users.

Iceberg architecture diagram
Iceberg architecture diagram

Iceberg organizes data in four levels: data files at the bottom, manifest files that record file‑to‑partition mappings, snapshots that capture the state of the table after each commit, and metadata files that track snapshots. This structure enables transactional reads, time‑travel queries, and efficient data management.

Iceberg file hierarchy
Iceberg file hierarchy

Iceberg’s three core advantages are:

Transactional guarantees that prevent dirty writes and enable snapshot‑based read/write isolation.

Implicit partitioning with transform functions, allowing automatic partition determination and partition evolution.

Row‑level updates via V1 (Copy‑On‑Write) and V2 (Merge‑On‑Read) table formats.

Iceberg advantages
Iceberg advantages

Xiaomi’s Practical Deployment

Xiaomi operates over 4,000 Iceberg tables, storing roughly 8 PB of data. More than 1,000 tables are V1 (log‑type workloads) and over 3,000 are V2 (CDC‑type workloads). CDC pipelines ingest MySQL, Oracle, and TiDB change logs via Flink CDC, push them to an intermediate MQ, and finally write to Iceberg V2 tables.

Xiaomi CDC pipeline
Xiaomi CDC pipeline

Key benefits observed include near‑real‑time analysis of CDC data, Flink‑native streaming support, automatic schema evolution, and cost reduction by replacing proprietary databases (e.g., Kudu) with Iceberg.

Real‑time analytics on CDC streams.

Flink streaming consumption of Iceberg tables.

Automatic schema synchronization from upstream sources.

Lower storage costs compared with legacy solutions.

Iceberg cost‑saving diagram
Iceberg cost‑saving diagram

Two common ingestion patterns are presented: Bucket partitioning (fixed number of buckets) and Truncate partitioning (dynamic range based on auto‑increment IDs). For auto‑increment primary keys, Truncate partitioning is recommended to avoid compaction bottlenecks and allow easier bucket changes.

Bucket vs Truncate partitioning
Bucket vs Truncate partitioning

Streaming‑Batch Integration Exploration

Building on the existing Lambda architecture, Xiaomi replaces both Hive and Kafka storage layers with Iceberg across ODS, DWD, and DWS stages. This unifies storage, reduces duplication, and enables Flink to serve as both batch and streaming engine.

Unified Iceberg storage stack
Unified Iceberg storage stack

To address data inconsistencies between real‑time and batch layers, Xiaomi employs offline correction jobs using either partition‑overwrite or MERGE INTO statements. Overwrite is simple and fast but may affect downstream streaming jobs; MERGE INTO provides incremental updates with lower impact but incurs join overhead.

Overwrite vs MERGE INTO
Overwrite vs MERGE INTO

Two isolation levels for MERGE INTO are discussed: serializable (default) which aborts on conflicting writes, and snapshot isolation which may silently overwrite conflicts, potentially leading to nondeterministic results.

MERGE INTO isolation levels
MERGE INTO isolation levels

Future Roadmap

Continue tracking Flink CDC 2.0 for smoother full‑to‑incremental switches.

Optimize compaction governance services to reduce resource contention.

Integrate Iceberg with Flink 1.14 to improve stability and back‑pressure handling.

Future roadmap illustration
Future roadmap illustration

Q&A Highlights

Rollback to a specific snapshot: Use the ROLLBACK command.

Upsert causing many small files: Increase compaction frequency and adjust checkpoint intervals.

Upsert partitioning strategy: Updates are based on primary keys; the partition column must be part of the primary key.

Flink reading monthly/annual billing data: Implement windowed aggregations on the incremental stream.

Why choose Iceberg over Hudi: Iceberg offered earlier Flink support and better community activity at the time of evaluation.

Current Iceberg table versions at Xiaomi: Both V1 and V2 are used; V2 for CDC, V1 for log workloads.

Migration ratio from Hive to Iceberg: Migration is gradual due to legacy Spark 2.3 constraints; new projects adopt Iceberg more readily.

Key benefits for users: Accurate ODS data, implicit partitioning for delayed data, and ZSTD compression for cost savings.

Performance of upserts on large data: Write path is not a bottleneck; compaction may be slower; recommend deduplication before merge for batch loads.

Storage format after Iceberg adoption: Unified Parquet storage on HDFS.

Difference from Alluxio: Iceberg provides table abstraction; Alluxio offers caching for faster reads.

Support for ClickHouse: Not directly supported yet; external tables may be considered.

Relation to StarRocks: Iceberg focuses on pipeline construction in data warehouses; StarRocks targets multidimensional analytics.

Why Spark task in DWD layer of Flink CDC pipeline: Spark currently is the only engine supporting MERGE INTO; a custom Flink‑like merge is also implemented.

Schema evolution handling: Custom jobs capture DDL changes and apply them to Iceberg schemas; Kafka DDL is not captured directly.

Low‑latency data reading: Aim for minute‑level latency; configure Flink checkpoints to 3–5 minutes.

Related articles
Related articles
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinkBatch ProcessingStreamingData LakeSparkIceberg
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.