Big Data 13 min read

Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

The article details Apache Iceberg 0.11.0's core enhancements—including partition changes, SortOrder, extensive Flink and Spark integrations, CDC/Upsert support, hash‑based write distribution to reduce small files, and upcoming 0.12.0 roadmap—while providing practical SQL and API examples for data‑lake practitioners.

DataFunTalk
DataFunTalk
DataFunTalk
Apache Iceberg 0.11.0: New Partition Support, SortOrder, Flink Streaming Reader, and Ecosystem Integrations

On January 27, 2021 Apache Iceberg released version 0.11.0, introducing core features such as mutable partition definitions at the Core API level and a new SortOrder specification that groups high‑cardinality columns to reduce small files and improve read efficiency.

Flink Integration : The release adds a Flink Streaming Reader, making Flink the first engine to support true stream‑batch read/write on Iceberg. It also implements limit and filter pushdown, CDC and Upsert event ingestion, and supports write.distribution-mode=hash to mitigate small‑file generation.

Users can launch streaming jobs via Flink SQL, for example:

SET execution.type = streaming;</code>
<code>SET table.dynamic-table-options.enabled=true;</code>
<code>SELECT * FROM sample /*+ OPTIONS('streaming'='true','monitor-interval'='1s') */;

Limit and filter pushdown are demonstrated with queries such as SELECT * FROM sample LIMIT 10; and various WHERE clauses covering equality, inequality, range, NULL checks, and LIKE patterns.

CDC and Upsert Support : The integration enables real‑time ingestion of relational database binlogs and Flink‑generated upsert streams into Iceberg, currently achieving the first stage of correctness validation and medium‑scale stability tests (e.g., deployments at Tencent, Bilibili, and AutoHome).

Spark 3 Integration : The release adds high‑level SQL capabilities including MERGE INTO, DELETE FROM, ALTER TABLE ... ADD/DROP PARTITION, ALTER TABLE ... WRITE ORDERED BY, and procedural calls for file compaction and cleanup.

Ecosystem Extensions : New modules integrate with AWS S3 and Glue Catalog, and the open‑source catalog service Nessie.

Hash‑Based Write Distribution : To reduce small files, users can set the table property:

CREATE TABLE sample (
    id BIGINT,
    data STRING
) PARTITIONED BY (data) WITH (
    'write.distribution-mode'='hash'
);

For finer control, the Java API can add bucket fields:

table.updateSpec()
    .addField(Expressions.bucket("id", 32))
    .commit();

Additional strategies include periodic major compaction jobs and a planned automatic small‑file merger for version 0.12.0.

Adoption and Community Impact : Major users such as Tencent, Netflix, AutoHome, and Tongcheng-Elong have migrated large Hive tables to Iceberg, leveraging Flink+Iceberg for real‑time analytics and cost savings. The article also notes contributions from Alibaba, Apple, and Cloudera, and outlines the roadmap for version 0.12.0 to further enhance CDC/Upsert stability, performance, and usability.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DataFlinksqlStreamingData LakeApache IcebergCDC
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.