Detailed Design and Practical Application of Apache Iceberg at NetEase Cloud Music
This article explains the motivations behind Apache Iceberg, its design principles such as snapshot and MVCC, compares it with Hive, and describes how NetEase Cloud Music adopted Iceberg to improve metadata handling, query performance, and operational stability for massive daily log data.
Apache Iceberg is an open‑source table format originally created by Netflix to overcome limitations of file‑based formats like Parquet and ORC; unlike those formats, Iceberg manages collections of files through a metadata layer, enabling atomic updates, time‑travel, and efficient partition pruning.
Traditional Hive suffers from unreliable updates, costly column renames, excessive partition metadata scans, and fragmented metadata storage, which lead to performance bottlenecks and lack of ACID guarantees.
Iceberg’s design goals include an open, language‑agnostic standard, strong extensibility, automatic metadata handling, and support for schema evolution, versioned snapshots, and time‑travel queries. Each write creates a snapshot containing a list of data files, and MVCC ensures readers always see a consistent version.
Key components are snapshots, manifests, and data files. Snapshots reference multiple manifest files; each manifest records partition information, file statistics, and column‑level metrics, allowing query engines to push down filters and avoid full directory listings.
Iceberg stores schema using column IDs, enabling column renames without rewriting data files and preserving compatibility with older files.
Iceberg is a scalable format for tables with a lot of best practices built in.In NetEase Cloud Music, daily user‑behavior logs generate 25‑30 TB and over 110 000 files. Direct Spark reads cause long initialization times due to massive NameNode requests. By creating an Iceberg table with hourly and behavior partitions and writing cleaned logs into it, initialization dropped from 30‑60 minutes to 5‑10 minutes, greatly improving ETL stability.
The Iceberg table layout consists of a metadata directory (containing snapshot, manifest‑list, and manifest files) and a data directory (holding the actual data files). Each metadata file captures schema, snapshot ID, task information, and manifest locations.
java -jar avro-tools-1.9.2.jar tojson --pretty snap-8844883026140670978-1-0e32a3de-51d1-4641-9235-181c87a8a2f8.avroWhen writing to Iceberg, data must be globally sorted by partition columns; Spark settings such as spark.driver.maxResultSize and spark.sql.shuffle.partitions should be tuned to avoid driver OOM and control file count.
uaDF.sort(expr("hour"), expr("group"), expr("action"), expr("logtime"))
.write.format("iceberg")
.option("write.parquet.row-group-size-bytes", 256 * 1024 * 1024)
.mode(SaveMode.Overwrite)
.save(output)Iceberg works with underlying formats like Parquet, ORC, and Avro; its benefits become most evident at very large scales where precise statistics and snapshot isolation reduce query latency.
Future work includes adding merge support to handle updates/deletes (similar to Hudi and Delta Lake) and integrating Flink to address small‑file problems, positioning Iceberg as the primary storage for batch‑stream unified data warehouses.
DataFunTalk
Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.