Databases 14 min read

How Will Apache Doris Evolve in 2026 to Power AI‑Driven Data Workloads?

The article outlines Apache Doris's 2026 roadmap, detailing how the database will shift from pure analytics to a unified AI‑enabled platform with enhanced semi‑structured data support, vector and hybrid search, agent‑focused capabilities, and expanded storage and lakehouse integrations to meet emerging AI workloads.

DataFunTalk
DataFunTalk
DataFunTalk
How Will Apache Doris Evolve in 2026 to Power AI‑Driven Data Workloads?

Background: From Faster Analytics to AI‑Centric Data Infrastructure

For years, data infrastructure has focused on accelerating data analysis. In 2026, the rise of AI applications redefines this goal: data systems become part of intelligent platforms, serving agents, model inputs, and real‑time consumption rather than just query engines.

2025 – The Starting Point

In 2025, Apache Doris released versions 3.1 and 4.0, achieving breakthroughs in analytical and retrieval capabilities.

Version 3.1 strengthened semi‑structured JSON analysis and Lakehouse support, improving inverted index and full‑text search performance and introducing a flexible tokenizer plugin mechanism.

Version 3.1 also enhanced support for external data sources such as Iceberg and Paimon, improving materialized view and query optimization, as well as data write and update performance.

These improvements laid the groundwork for handling the rapidly growing JSON‑style data that AI agents generate.

2026 – Scenario‑Based Evolution

Doris’s 2026 roadmap is organized around four key scenarios:

1. Semi‑Structured Data Analysis & AI Observability

AI workloads generate massive, schema‑less JSON logs and agent traces. Doris will continue to enhance the Variant type, optimize storage for sparse and string columns, and integrate OpenTelemetry to unify trace, log, and metric data.

2. Hybrid Search & Analysis (HSAP)

Doris 4.0 introduced vector search, enabling a single engine to handle structured, semi‑structured, and vector data. Future work includes disk‑based ANN algorithms for billions of vectors, merged vector‑text indexing, and global index improvements for efficient top‑N semantic queries.

3. Multi‑Modal Scenarios & AI SQL

To support multi‑modal data, Doris will add AI‑SQL and Python UDF integration, and introduce a File data type that can expose file metadata in SQL and process file contents directly in AI‑SQL or Python UDFs, enabling end‑to‑end pipelines from ingestion to vector construction.

4. Agent‑Focused Analysis

With agents becoming primary callers, Doris will build a richer semantic layer (metadata APIs, tagging) and deepen Data Agent integration, allowing agents to interact naturally and retrieve accurate results.

Query Engine Enhancements

Capability Completion: Add ASOF Join, Recursive CTE, UNNEST, and enhance MERGE INTO for full CDC workflows.

Performance Optimization: Strengthen Condition Cache, redesign ZoneMap expressions, and improve JSON column pruning for high‑concurrency scenarios.

Large‑Scale Task Stability: Optimize spill‑to‑disk and global buffer management to ensure stable execution of massive jobs under limited resources.

Storage Engine: Scale, Cache, Elasticity

Scale: Address ultra‑wide tables and massive Tablet metadata, handling thousands of columns generated from Variant sub‑columns.

Cache: Advance Smart Caching with fine‑grained policies, time‑range, table‑level, and partition‑level controls, and support targeted pre‑warming of hot partitions.

Elasticity: Leverage cloud infrastructure for dynamic scaling, improve read‑write separation, and accelerate node startup via persistent metadata and local caches.

Open Data Lake: Read, Write, Governance

Doris will deepen integration with Iceberg and Paimon, enabling near‑native query performance on lake tables without data migration, enhancing Parquet page cache, extending Condition Cache to lake scenarios, and supporting full DDL/DML lifecycle management.

Governance will be strengthened through deeper catalog integrations, third‑party authentication, and a comprehensive Open API for metadata and semantic services.

Conclusion

Data formats are shifting from purely structured tables to JSON, vectors, and multi‑modal assets, while usage expands from human users to AI agents. Apache Doris’s 2026 vision moves beyond raw analytical speed to become a unified data platform that supports analysis, retrieval, and AI‑agent workloads in a single engine.

vector searchdata lakeAI integrationApache DorisHybrid SearchDatabase Roadmap
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.