Doris Architecture, Principles, and Key Features Overview
This article provides a comprehensive overview of Doris's architecture—including its FE and BE components, metadata management, data organization, execution planning—and details its major features such as adaptive join aggregation, vectorized execution, materialized views, and Elasticsearch integration, supplemented with example DDL and query code.
Doris is an MPP analytical database that consists of two main components: Frontend (FE) and Backend (BE). FE handles query compilation, distribution, and metadata management using in‑memory structures similar to HDFS NameNode, while BE executes queries and stores physical data.
The overall architecture is simple, requiring only FE and BE processes without external dependencies, which eases deployment and operation. FE maintains cluster metadata in memory with high availability achieved through a leader‑follower‑observer model; observers can be added to scale read‑only query capacity.
Metadata management in FE relies on the Paxos protocol combined with a Memory + Checkpoint + Journal mechanism. Updates are first written to a write‑ahead log on disk, then to memory, and periodically checkpointed, ensuring fast recovery and durability.
Data is stored on BE nodes with default triple replication for reliability; FE schedules replica placement and rebalancing. Query execution plans are generated entirely in FE, following a two‑step process: first creating a single‑node logical plan, then transforming it into distributed PlanFragments by inserting ExchangeNodes to minimize data movement and maximize local scans.
Key features of Doris include:
Adaptive two‑stage aggregation for both Broadcast and Shuffle joins, with optional manual colocation joins.
Vectorized execution engine for high‑performance processing.
Dynamic roll‑up addition, delayed materialized views, and prefix indexing.
Support for Roaring Bitmap indexes and low‑cardinality dictionary encoding.
MPP architecture and integration with Elasticsearch via external tables.
Example of creating an Elasticsearch external table in Doris:
CREATE EXTERNAL TABLE `es_table` (
`id` bigint(20) COMMENT "",
`k1` bigint(20) COMMENT "",
`k2` datetime COMMENT "",
`k3` varchar(20) COMMENT "",
`k4` varchar(100) COMMENT "",
`k5` float COMMENT ""
) ENGINE=ELASTICSEARCH
PARTITION BY RANGE(`id`)
PROPERTIES (
"hosts" = "http://192.168.0.1:8200,http://192.168.0.2:8200",
"user" = "root",
"password" = "root",
"index" = "tindex",
"type" = "doc"
);Querying the external table:
select * from es_table where esquery(k4, '{ "match": { "k4": "doris on elasticsearch" } }');Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Big Data Technology & Architecture
Wang Zhiwu, a big data expert, dedicated to sharing big data technology.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
