Databases 18 min read

Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

This article introduces Apache Doris 1.0, detailing its simplified architecture, high‑concurrency support, MPP execution engine, vectorized engine, memory‑controlled stability, multi‑source integration, upcoming lake‑house unification, storage‑compute separation, real‑time ingestion, and community growth.

DataFunTalk
DataFunTalk
DataFunTalk
Apache Doris 1.0: Features, Architecture, Performance Improvements and Future Roadmap

Apache Doris is an MPP analytical database positioned in the big‑data ecosystem, providing a simple two‑role architecture (Frontend and Backend) that stores metadata, handles query planning, and executes queries without relying on third‑party services.

The system integrates data from upstream TP systems, business event logs, and web logs via batch or streaming pipelines, storing them in Doris for real‑time dashboards, multi‑dimensional reports, traffic statistics, self‑service queries, and user profiling.

Key features of Doris are presented in six points:

Minimalist Architecture : Only Frontend (FE) and Backend (BE) processes are needed, offering MySQL‑compatible protocol and easy deployment.

Self‑Managing High Availability : Automatic data sharding, rebalancing, and replica repair across BE nodes without manual intervention.

High‑Concurrency Support : Partition pruning and data caching reduce resource usage, enabling thousands of QPS on a single node.

MPP Execution Engine : Parallel physical plans run on multiple BE nodes, with exchange nodes for data shuffle, achieving massive parallelism.

Detail and Aggregate Data : Materialized views provide pre‑aggregated results while guaranteeing strong consistency with base tables.

Portable Data Ingestion : Supports Kafka, Spark/Flink connectors, broker load, and stream load for batch and near‑real‑time data import.

Apache Doris 1.0 Version Analysis

The 1.0 release focuses on three aspects: speed, stability, and multi‑source support.

1. Speed – Vectorized Engine

The vectorized engine adopts column‑oriented memory layout, a vectorized computation framework, cache affinity, reduced virtual‑function calls, and SIMD instructions to achieve 3‑10× performance gains in both SSD benchmark joins and single‑table analytical queries.

2. Stability – Memory Control

Memory usage is now tracked by a hierarchical MemTracker system (process, module, query, load, task, cache) with Thread‑Local hooks and TcMalloc integration, enabling precise observation and strict limits to prevent OOM crashes.

3. Multi‑Source – Data Lake Integration

Doris now supports external Hive and Iceberg tables, enabling query federation without data migration, and plans to introduce a Multi‑Catalog mechanism and merge‑on‑read capabilities for real‑time lake updates.

Future Roadmap

Lake‑house unification with Multi‑Catalog and support for Hudi and merge‑on‑read.

Storage‑compute separation using cheap S3 storage and elastic stateless compute nodes.

Real‑time ingestion with high‑concurrency write paths and support for UNIQUE‑KEY (merge‑on‑read) and future COPY‑ON‑WRITE models.

Enhanced stability and observability, including tracing for rapid issue diagnosis.

Open‑Source Community

As of April, Doris has over 300 contributors and 70 active monthly contributors, with ongoing graduation efforts. The community aims to be a vibrant hub for database and OLAP enthusiasts, providing resources via GitHub, roadmaps, and the DataFunTalk WeChat channel.

PerformanceBig Dataopen-sourceMPPApache DorisAnalytical DatabaseVectorized Engine
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.