Apache Doris Graduates to Top-Level Project: Features and Community Overview
On June 16, 2022 the Apache Software Foundation announced that Apache Doris has graduated from the incubator to become a Top‑Level Project, highlighting its high‑performance MPP analytical database capabilities, broad industry adoption, and a vibrant open‑source community.
On June 16, 2022 the Apache Software Foundation announced that Apache Doris successfully graduated from the Apache Incubator and is now an official Apache Top‑Level Project (TLP).
Apache Doris is a modern, high‑performance, real‑time analytical database built on a massively parallel processing (MPP) architecture. It delivers sub‑second query responses on massive data sets, supporting both high‑concurrency point queries and high‑throughput complex analytics, making it suitable for multidimensional reporting, user profiling, ad‑hoc queries, and real‑time dashboards.
Doris originated from Baidu’s internal advertising reporting system (Palo) and was open‑sourced in 2017. In July 2018 Baidu donated the project to the Apache Foundation, where it was incubated and guided by Apache mentors.
According to Apache Doris VP Chen Mingyu, graduating marks a major milestone that reflects the community’s growth and the project’s alignment with the Apache Way.
The Doris community now includes over 300 contributors from nearly 100 enterprises across various industries, with about 100 active contributors each month. Since incubation, Doris has released eight major versions, introducing storage engine upgrades, a vectorized execution engine, and other key features, culminating in the 1.0 release.
Today Apache Doris is deployed in production at more than 500 companies worldwide, including over 80% of the top‑50 Chinese internet firms such as Baidu, Meituan, Xiaomi, JD.com, ByteDance, Tencent, Kuaishou, NetEase, Weibo, Sina, and 360, as well as many enterprises in finance, energy, manufacturing, and telecommunications.
Key advantages of Apache Doris:
Outstanding Performance: Columnar storage, high compression, rich indexing, partition‑pruning, and a vectorized execution engine combined with intelligent materialized views and cost‑based/plan‑based optimizers enable thousands of QPS per node and ultra‑low latency.
Ease of Use: Full ANSI SQL support (including joins, subqueries, window functions, grouping sets) and compatibility with MySQL protocol allow seamless integration with client tools and BI platforms.
Simplified Architecture: Only two components – Frontend (FE) and Backend (BE). FE handles request routing, query planning, metadata, and cluster management; BE stores data and executes plans. The system scales horizontally to hundreds of nodes and over 10 PB of data without third‑party components.
Stability and Reliability: Multi‑replica storage with self‑healing, automatic data rebalancing, and online scaling/maintenance ensure continuous service without downtime.
Rich Ecosystem: Native connectors for Hadoop, Flink, Spark, Kafka, SeaTunnel, MySQL, PostgreSQL, Oracle, S3, Hive, Iceberg, Elasticsearch, and the ability to read/write data directly from these sources.
Chen Mingyu emphasized that graduation is not the end but the start of a new journey, with future work focusing on a new query optimizer, lake‑house integration, and cloud‑native architecture evolution, inviting more contributors to join the community.
Big Data Technology Architecture
Exploring Open Source Big Data and AI Technologies
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.