How Xiaomi Uses Iceberg for Real‑Time Streaming and Batch Data Lakes
This article introduces Iceberg’s table‑format fundamentals, details Xiaomi’s large‑scale deployment of Iceberg for CDC and log ingestion, explores their streaming‑batch integration experiments, outlines future roadmap items, and provides a comprehensive Q&A covering practical challenges and solutions.
Iceberg Technology Overview
Iceberg is a table format designed for large analytical datasets that abstracts compute engines such as Spark, Trino, Flink, Hive, and Presto from the underlying storage layer. It supports file formats like Parquet, ORC, and Avro, and can store data on S3, OSS, or HDFS. The architecture separates compute and storage, allowing flexible engine choice and hiding file‑format details from users.
Iceberg organizes data in four levels: data files at the bottom, manifest files that record file‑to‑partition mappings, snapshots that capture the state of the table after each commit, and metadata files that track snapshots. This structure enables transactional reads, time‑travel queries, and efficient data management.
Iceberg’s three core advantages are:
Transactional guarantees that prevent dirty writes and enable snapshot‑based read/write isolation.
Implicit partitioning with transform functions, allowing automatic partition determination and partition evolution.
Row‑level updates via V1 (Copy‑On‑Write) and V2 (Merge‑On‑Read) table formats.
Xiaomi’s Practical Deployment
Xiaomi operates over 4,000 Iceberg tables, storing roughly 8 PB of data. More than 1,000 tables are V1 (log‑type workloads) and over 3,000 are V2 (CDC‑type workloads). CDC pipelines ingest MySQL, Oracle, and TiDB change logs via Flink CDC, push them to an intermediate MQ, and finally write to Iceberg V2 tables.
Key benefits observed include near‑real‑time analysis of CDC data, Flink‑native streaming support, automatic schema evolution, and cost reduction by replacing proprietary databases (e.g., Kudu) with Iceberg.
Real‑time analytics on CDC streams.
Flink streaming consumption of Iceberg tables.
Automatic schema synchronization from upstream sources.
Lower storage costs compared with legacy solutions.
Two common ingestion patterns are presented: Bucket partitioning (fixed number of buckets) and Truncate partitioning (dynamic range based on auto‑increment IDs). For auto‑increment primary keys, Truncate partitioning is recommended to avoid compaction bottlenecks and allow easier bucket changes.
Streaming‑Batch Integration Exploration
Building on the existing Lambda architecture, Xiaomi replaces both Hive and Kafka storage layers with Iceberg across ODS, DWD, and DWS stages. This unifies storage, reduces duplication, and enables Flink to serve as both batch and streaming engine.
To address data inconsistencies between real‑time and batch layers, Xiaomi employs offline correction jobs using either partition‑overwrite or MERGE INTO statements. Overwrite is simple and fast but may affect downstream streaming jobs; MERGE INTO provides incremental updates with lower impact but incurs join overhead.
Two isolation levels for MERGE INTO are discussed: serializable (default) which aborts on conflicting writes, and snapshot isolation which may silently overwrite conflicts, potentially leading to nondeterministic results.
Future Roadmap
Continue tracking Flink CDC 2.0 for smoother full‑to‑incremental switches.
Optimize compaction governance services to reduce resource contention.
Integrate Iceberg with Flink 1.14 to improve stability and back‑pressure handling.
Q&A Highlights
Rollback to a specific snapshot: Use the ROLLBACK command.
Upsert causing many small files: Increase compaction frequency and adjust checkpoint intervals.
Upsert partitioning strategy: Updates are based on primary keys; the partition column must be part of the primary key.
Flink reading monthly/annual billing data: Implement windowed aggregations on the incremental stream.
Why choose Iceberg over Hudi: Iceberg offered earlier Flink support and better community activity at the time of evaluation.
Current Iceberg table versions at Xiaomi: Both V1 and V2 are used; V2 for CDC, V1 for log workloads.
Migration ratio from Hive to Iceberg: Migration is gradual due to legacy Spark 2.3 constraints; new projects adopt Iceberg more readily.
Key benefits for users: Accurate ODS data, implicit partitioning for delayed data, and ZSTD compression for cost savings.
Performance of upserts on large data: Write path is not a bottleneck; compaction may be slower; recommend deduplication before merge for batch loads.
Storage format after Iceberg adoption: Unified Parquet storage on HDFS.
Difference from Alluxio: Iceberg provides table abstraction; Alluxio offers caching for faster reads.
Support for ClickHouse: Not directly supported yet; external tables may be considered.
Relation to StarRocks: Iceberg focuses on pipeline construction in data warehouses; StarRocks targets multidimensional analytics.
Why Spark task in DWD layer of Flink CDC pipeline: Spark currently is the only engine supporting MERGE INTO; a custom Flink‑like merge is also implemented.
Schema evolution handling: Custom jobs capture DDL changes and apply them to Iceberg schemas; Kafka DDL is not captured directly.
Low‑latency data reading: Aim for minute‑level latency; configure Flink checkpoints to 3–5 minutes.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
