Big Data 7 min read

Doris Architecture, Principles, and Key Features Overview

This article provides a comprehensive overview of Doris's architecture—including its FE and BE components, metadata management, data organization, execution planning—and details its major features such as adaptive join aggregation, vectorized execution, materialized views, and Elasticsearch integration, supplemented with example DDL and query code.

Big Data Technology & Architecture
Big Data Technology & Architecture
Big Data Technology & Architecture
Doris Architecture, Principles, and Key Features Overview

Doris is an MPP analytical database that consists of two main components: Frontend (FE) and Backend (BE). FE handles query compilation, distribution, and metadata management using in‑memory structures similar to HDFS NameNode, while BE executes queries and stores physical data.

The overall architecture is simple, requiring only FE and BE processes without external dependencies, which eases deployment and operation. FE maintains cluster metadata in memory with high availability achieved through a leader‑follower‑observer model; observers can be added to scale read‑only query capacity.

Metadata management in FE relies on the Paxos protocol combined with a Memory + Checkpoint + Journal mechanism. Updates are first written to a write‑ahead log on disk, then to memory, and periodically checkpointed, ensuring fast recovery and durability.

Data is stored on BE nodes with default triple replication for reliability; FE schedules replica placement and rebalancing. Query execution plans are generated entirely in FE, following a two‑step process: first creating a single‑node logical plan, then transforming it into distributed PlanFragments by inserting ExchangeNodes to minimize data movement and maximize local scans.

Key features of Doris include:

Adaptive two‑stage aggregation for both Broadcast and Shuffle joins, with optional manual colocation joins.

Vectorized execution engine for high‑performance processing.

Dynamic roll‑up addition, delayed materialized views, and prefix indexing.

Support for Roaring Bitmap indexes and low‑cardinality dictionary encoding.

MPP architecture and integration with Elasticsearch via external tables.

Example of creating an Elasticsearch external table in Doris:

CREATE EXTERNAL TABLE `es_table` (
  `id` bigint(20) COMMENT "",
  `k1` bigint(20) COMMENT "",
  `k2` datetime COMMENT "",
  `k3` varchar(20) COMMENT "",
  `k4` varchar(100) COMMENT "",
  `k5` float COMMENT ""
) ENGINE=ELASTICSEARCH
PARTITION BY RANGE(`id`)
PROPERTIES (
  "hosts" = "http://192.168.0.1:8200,http://192.168.0.2:8200",
  "user" = "root",
  "password" = "root",
  "index" = "tindex",
  "type" = "doc"
);

Querying the external table:

select * from es_table where esquery(k4, '{ "match": { "k4": "doris on elasticsearch" } }');
Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Big DatasqlElasticsearchquery optimizationDatabase ArchitectureMPPdoris
Big Data Technology & Architecture
Written by

Big Data Technology & Architecture

Wang Zhiwu, a big data expert, dedicated to sharing big data technology.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.