Tagged articles
4 articles
Page 1 of 1
Alibaba Cloud Big Data AI Platform
Alibaba Cloud Big Data AI Platform
Apr 17, 2026 · Big Data

What Spark 4.0 Brings: VARIANT Type, Native SQL UDFs, and Serverless Enhancements

Apache Spark 4.0 introduces a high‑performance VARIANT data type for semi‑structured JSON, native SQL UDFs that eliminate Python UDF bottlenecks, a richer Python DataSource API, a new pipeline syntax, upgraded Structured Streaming state management, and Alibaba Cloud EMR Serverless optimizations that together deliver up to 30% speed gains and seamless migration from Spark 3.x.

Apache SparkPython APISQL UDF
0 likes · 12 min read
What Spark 4.0 Brings: VARIANT Type, Native SQL UDFs, and Serverless Enhancements
Past Memory Big Data
Past Memory Big Data
Apr 13, 2026 · Big Data

11 Critical Pitfalls to Watch When Upgrading from Spark 3 to Spark 4

Spark 4.0 delivers 20‑50% performance gains and new features like Spark Connect, VARIANT types, and enhanced SQL, but it also introduces breaking changes such as mandatory JDK 17, dropping Scala 2.12, default ANSI mode, removal of Mesos, and altered JDBC type mappings, requiring careful planning and staged migration to avoid runtime failures.

ANSI modeApache SparkJDK 17
0 likes · 19 min read
11 Critical Pitfalls to Watch When Upgrading from Spark 3 to Spark 4
Past Memory Big Data
Past Memory Big Data
Apr 13, 2026 · Big Data

Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses

Apache Iceberg v3 introduces deletion vectors, row‑level lineage, a native VARIANT type, default column values, and nanosecond timestamps, delivering up to ten‑fold faster updates, native CDC, seamless semi‑structured data handling, and industry‑wide adoption that effectively ends the format war between lake and warehouse solutions.

Apache IcebergData LakehouseDefault Column Values
0 likes · 14 min read
Why Iceberg v3 Marks the “iPhone Moment” for Data Lakehouses
Big Data Technology & Architecture
Big Data Technology & Architecture
Oct 13, 2025 · Databases

Apache Doris 3.1 Unveiled: Variant, Index, and Lakehouse Boosts

The Apache Doris 3.1 release strengthens lake‑house capabilities with major upgrades to the VARIANT data type, vertical compaction, inverted index storage, new tokenizers, enhanced materialized view support for Iceberg/Paimon/Hudi, and numerous query‑performance optimizations such as faster partition pruning and dynamic partition clipping, offering smoother handling of thousands of columns and large‑scale semi‑structured data.

Apache DorisDatabasesLakehouse
0 likes · 8 min read
Apache Doris 3.1 Unveiled: Variant, Index, and Lakehouse Boosts