Databases 21 min read

What Does a Decade Mean for Apache Doris? – Highlights from Doris Summit 2022

The Doris Summit 2022 recap outlines a ten‑year journey from an internal Baidu project to a top‑level Apache OLAP database, detailing explosive community growth, 2022 milestones, major feature releases up to version 1.2, and an ambitious 2023 roadmap focused on performance, lakehouse integration, multi‑modal analysis, cost efficiency, and enhanced usability.

DataFunTalk
DataFunTalk
DataFunTalk
What Does a Decade Mean for Apache Doris? – Highlights from Doris Summit 2022

Ten Years of Apache Doris

The talk reflects on a decade of evolution for Apache Doris, from its origins in Baidu's advertising system in 2008, through the formal establishment of the OLAP engine in 2013, to its open‑source release in 2017 and graduation to an Apache top‑level project in 2022.

Community Milestones in 2022

Contributors grew from ~200 to nearly 420, a >100% increase.

Monthly active contributors doubled from 50 to 100.

GitHub stars rose from 3.6k to 6.8k, with multiple trending rankings.

Total commits increased from 3.7k to 7.6k, surpassing the cumulative code added in previous years.

Over 1,000 enterprise users across industries adopted Apache Doris, and the project became one of the most active open‑source communities in the database space.

2022 Feature Releases (v1.0 – v1.2)

v1.0 introduced vectorized execution.

v1.1 refined the vectorized engine, enabled LTS releases, and added Merge‑On‑Write for Unique Key tables.

v1.2 delivered a ten‑fold query performance boost, Multi‑Catalog for lakehouse integration, support for Array and JSONB types, and numerous stability and testing improvements.

Core Feature Evolution

Performance: Achieved top‑3 rankings on Clickbench and multi‑fold gains on SSB/TPC‑H.

Real‑time: Merge‑On‑Write provides 5‑10× faster updates for high‑frequency workloads.

Semi‑structured support: Native Array and JSONB types enable efficient log and JSON analysis.

Lakehouse: Multi‑Catalog delivers 3‑5× faster queries than Trino/Presto and 10‑100× faster than Hive on external tables.

2023 Roadmap

High Performance

New query optimizer with richer rule models for complex SQL and TPC‑DS coverage.

Short‑Circuit Plan, Prepare Statement, and Query Cache to reach tens of thousands QPS per node.

Multi‑table materialized views with asynchronous refresh and incremental computation.

Cost Efficiency

Cold‑data tiering to object storage and cache mechanisms to lower storage costs.

Elastic Compute Nodes for independent scaling of compute resources.

Mixed Workloads

Pipeline execution engine for flexible parallelism.

Workload Manager for fine‑grained resource isolation.

Lightweight fault tolerance for robust ETL/ELT pipelines.

Support for Hive/Trino/Spark functions and multi‑language UDFs.

Multi‑Modal Data Analysis

Future support for Map, Struct, IP, GEO, and time‑series types.

Advanced text analysis algorithms, full‑text search, and N‑gram Bloom filters.

Dynamic schema tables that adapt automatically to incoming data.

Lakehouse Enhancements

Extended catalog support for Delta Lake, Iceberg, Hudi, and snapshot management.

Write‑back capabilities and materialized views for lakehouse workflows.

Usability & Stability

Simplified table creation by removing bucket settings.

RBAC‑based security, row‑level permissions, and data masking.

Enhanced profiling tools and visualizations.

Improved BI compatibility and official integrations with DBT, Airbyte, and major BI platforms.

Looking Forward

The speaker envisions Apache Doris as a unified, high‑performance, real‑time, multi‑modal analytical database that eliminates the need for multiple disparate systems, thereby reducing operational complexity and boosting productivity for the next decade.

Big DataDatabaseOLAPCommunityRoadmapApache Doris
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.