Big Data 7 min read

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Apache Doris 3.0 introduces storage‑compute separation, native lakehouse write‑back, optimized Variant handling for semi‑structured data, stronger ETL transaction support, enhanced multi‑table materialized views, and Java UDTF capabilities, providing developers with more flexible, cost‑effective, and high‑performance analytics solutions.

Big Data Technology & Architecture

Oct 21, 2024

Key New Features of Apache Doris 3.0: Storage‑Compute Separation, Lakehouse Integration, Semi‑Structured Data, ETL Enhancements, Materialized Views, and Java UDTF

Last week the Doris community released version 3.0, a milestone for the lake‑warehouse convergence roadmap, and the official documentation has been updated.

Storage‑Compute Separation Architecture

Starting with version 3.0, Doris supports a storage‑compute separation mode, allowing users to choose between integrated or separated deployment.

The separation decouples compute nodes from primary data storage, introducing a shared storage layer (HDFS or object storage) as a unified data repository, which enables multiple clusters to share the same data and significantly reduce costs.

In version 2.0 the feature was weak, mainly exposing external table functionality; version 3.0 expands this capability.

Lakehouse Integration

The lakehouse concept is becoming a long‑term hot spot in the data domain, especially with frameworks like Paimon challenging traditional data‑warehouse development.

While Doris 2.0 could only read data from lakehouse sources, version 3.0 adds write‑back support for Hive and Iceberg, allowing users to create Hive/Iceberg tables directly from Doris and write data back to offline lakehouses, simplifying and accelerating lakehouse construction.

Semi‑Structured Data Analysis Enhancements

Version 2.0 introduced inverted indexes, N‑Gram Bloom filters, and the Variant data type, but query performance for complex structures remained poor.

Version 3.0 optimizes the Variant type, making JSON processing faster and supporting index creation, partial column updates, usage in storage‑compute separation mode, and export to Parquet or CSV.

1. Variant data type supports index creation (inverted index, Bloom Filter, ZoneMap);
2. Unique tables with Variant allow flexible partial column updates;
3. Variant works in storage‑compute separation mode with optimized metadata storage;
4. Variant can be exported to Parquet, CSV, etc.

ETL Capability Enhancements

Improvements focus on transactional support and observability.

Transactional enhancements provide explicit transaction support for INSERT‑SELECT, DELETE, and UPDATE operations, preventing issues such as phantom reads.

BEGIN;
DELETE FROM table WHERE date >= "2024-07-01" AND date <= "2024-07-31";
INSERT INTO table SELECT * FROM stage_table;
COMMIT;

Observability is improved by delivering query profiles earlier, reducing the impact of heavy queries on production systems.

Multi‑Table Materialized Views

Version 3.0 strengthens the construction and stability of multi‑table materialized views, refactoring synchronous view rewrite logic, extending transparent rewrite capabilities, and enhancing usability of asynchronous materialized views for faster query acceleration and more reliable data modeling.

Java UDTF Support

Version 3.0 adds support for Java User‑Defined Table Functions (UDTF), offering powerful extensibility for custom data processing.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

data warehouse ETL Materialized Views Storage Compute Separation Apache Doris semi-structured data Java UDTF

Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.