Apache Beam 2.28.0 Release Highlights and New Features
Apache Beam 2.28.0 introduces extensive Parquet support, new hash functions in BeamSQL and ZetaSQL, ApproximateDistinct via HLL, enhanced I/O connectors including SpannerIO for Numeric fields, ParquetIO schema support, KafkaTableProvider thrift, HadoopFormatIO key/value cloning skip, and various other improvements.
Apache Beam 2.28.0 has been released. Beam provides a unified programming model for defining and executing data processing pipelines, including ETL, batch, and streaming, abstracting the execution engine so pipelines can run on any distributed compute engine.
Update Highlights
大量改进与 Parquet 支持相关 (BEAM-11460, BEAM-8202, BEAM-11526)
BeamSQL 中的哈希函数 (BEAM-10074)
ZetaSQL 中的哈希函数 (BEAM-11624)
使用 HLL Impl 创建 ApproximateDistinct (BEAM-10324)
I/Os
SpannerIO 支持面向 Numeric 字段使用 BigDecimal (BEAM-11643)
将 Beam schema 支持添加到 ParquetIO (BEAM-11526)
支持 ParquetTable Writer (BEAM-8202)
GCP BigQuery sink (streaming inserts) 使用 runner 已确定的分片 (BEAM-11408)
PubSub 支持类型:TIMESTAMP, DATE, TIME, DATETIME (BEAM-11533)
New Features / Improvements
ParquetIO 添加 readGenericRecords 和 readFilesGenericRecords 方法以读取未知 schema 的文件(参见 PR-13554, BEAM-11460)
在 KafkaTableProvider 中添加对 thrift 的支持 (BEAM-11482)
在 HadoopFormatIO 中添加支持以跳过 key/value 克隆 (BEAM-11457)
在 Convert.to 转换中支持转换为 GenericRecords (BEAM-11571)
支持读取未知 schema 的 Parquet 文件 (BEAM-11460)
Laravel Tech Community
Specializing in Laravel development, we continuously publish fresh content and grow alongside the elegant, stable Laravel framework.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.