Databases 13 min read

Apache Doris and Palo: Architecture, Core Features, Use Cases, and Future Roadmap

This article introduces Apache Doris, an open‑source OLAP database, outlines its development history, typical application scenarios, key architectural components and performance optimizations, and then discusses the commercial Palo product and the planned future enhancements for both platforms.

DataFunTalk
DataFunTalk
DataFunTalk
Apache Doris and Palo: Architecture, Core Features, Use Cases, and Future Roadmap

Apache Doris is a Baidu‑originated open‑source OLAP database that supports standard SQL via the MySQL protocol, enabling real‑time, sub‑second query responses on massive datasets and serving a wide range of analytical workloads.

2008: launched internally for Baidu's Fengchao, achieving minute‑level data refresh.

2009‑2012: generalized for internal reporting across Baidu.

2013: upgraded to a full MPP framework with distributed computing.

2017: open‑sourced on GitHub.

2018: donated to Apache, renamed Apache Doris, gaining >5.4k stars and 360+ contributors.

June 2022: graduated to a top‑level Apache project.

Doris addresses common pain points such as low query latency, the need for near‑real‑time analytics, long development cycles, and data silos caused by heterogeneous tools (Hive, Impala, etc.). It provides a unified platform with the following capabilities:

Sub‑second query latency on massive data.

Streaming data ingestion for real‑time insights.

Unified data‑flow architecture across big‑data platforms.

Federated queries across ODBC, Hive, Iceberg, Hudi, Elasticsearch, and more.

MySQL‑compatible SQL interface for seamless BI tool integration.

The system follows a master‑slave architecture with Frontend (FE) nodes handling query planning and scheduling, and Backend (BE) nodes executing plans and storing data. Performance is driven by a columnar storage engine, vectorized execution, and multiple optimizer layers.

Key technical highlights include:

Highly optimized storage engine with columnar layout, multiple encodings, and up to 1:8 compression.

Rich index support: sparse index, Min‑Max, Bloom Filter, Bitmap.

MPP execution model where each node processes data in parallel, using a Volcano‑style operator pipeline.

Runtime filters (In, Min‑Max, Bloom) that push down small‑table predicates to large tables, reducing data transfer.

Rule‑based optimizations such as constant folding, sub‑query rewrite, and predicate push‑down.

Cost‑based optimizations including Join Reorder, Colocation Join, and Bucket Join to minimize shuffle.

Vectorized engine leveraging SIMD instructions for CPU‑friendly computation.

Future directions for Doris include a new cascade optimizer that is more data‑sensitive, multi‑table materialized views for complex query acceleration, and expanded support for complex data types (Array, Struct, Map) to improve compatibility with ecosystems like Spark and Hive.

Palo is a commercial data‑warehouse product built entirely on Doris, offering cloud‑native resource elasticity, hot‑cold data separation via object storage, comprehensive monitoring, and flexible deployment options (public cloud, private cloud, on‑premises). It has been used internally at Baidu for over a decade across dozens of business lines and is now serving hundreds of external customers in finance, e‑commerce, government, transportation, manufacturing, and media.

Readers are encouraged to try the product on Baidu Cloud, follow the Doris community on GitHub and the official website, and contribute code or feedback to help the project grow into a world‑class open‑source solution.

Big DataSQLData WarehouseOLAPApache DorisPalo
DataFunTalk
Written by

DataFunTalk

Dedicated to sharing and discussing big data and AI technology applications, aiming to empower a million data scientists. Regularly hosts live tech talks and curates articles on big data, recommendation/search algorithms, advertising algorithms, NLP, intelligent risk control, autonomous driving, and machine learning/deep learning.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.