Databases 15 min read

How Does GaussDB Execute SQL? Inside Its Engine and Optimization

This article explains GaussDB's system architecture, the roles of GTM, CN, and DN, the SQL and storage engines, parsing stages, rule‑based and cost‑based optimizers (including AI‑based techniques), distributed query plans, execution operators, and the parallel execution framework that together enable high‑performance SQL processing in a cloud‑native distributed database.

Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
Huawei Cloud Developer Alliance
How Does GaussDB Execute SQL? Inside Its Engine and Optimization

GaussDB System Architecture

GaussDB can be deployed in centralized or distributed mode on Huawei Cloud, as shown in the architecture diagrams.

Centralized deployment
Centralized deployment
Distributed deployment
Distributed deployment

Key Components

GTM : Global Transaction Manager that generates and maintains global transaction IDs, snapshots, timestamps, and sequence information.

CN : Coordinator Node that receives client requests, decomposes tasks, and schedules parallel execution on Data Nodes.

DN : Data Node that stores business data (row, column, or hybrid), executes query tasks, and returns results to the CN.

Logical Architecture

GaussDB logical architecture
GaussDB logical architecture

GaussDB consists of a SQL engine and a storage engine. The SQL engine handles parsing, optimization, and execution, while the storage engine manages I/O, buffering, and transaction logging.

SQL Engine Layers

Parser : Performs lexical analysis, syntax analysis (producing an AST), and semantic analysis (producing a query tree).

Optimizer : Includes Rule‑Based Optimization (RBO) and Cost‑Based Optimization (CBO). GaussDB also explores AI‑Based Optimization (ABO) to adapt cost models using machine learning.

Key RBO Techniques : Predicate pushdown, redundant operation elimination, subquery lift, outer‑to‑inner conversion, join reordering, and inequality‑join elimination.

Query optimization steps
Query optimization steps

Distributed Query Optimization

GaussDB supports four distributed execution plans:

CN Lightweight : Directly sends the statement to a single DN (LIGHT_PROXY).

Fast Query Shipping (FQS) : Generates a RemoteQuery plan and pushes it to DN(s) for execution, aggregating results on the CN.

STREAM : Generates a plan with stream operators that enable data exchange between DNs.

Remote‑Query : Splits the query into remote parts executed on DNs and a final part executed on the CN.

Four distributed plans
Four distributed plans

Stream Operators

Stream operators
Stream operators

Execution Operators

Scan Plan Node : Reads data from storage (e.g., SeqScan, IndexScan).

Control Plan Node : Handles flow control such as LIMIT, UNION.

Materialize Plan Node : Buffers intermediate results for operators like AGG and SORT.

Join Plan Node : Implements MergeJoin, NestLoop, HashJoin.

Other Operators : Various auxiliary operators.

Parallel Execution Architecture

GaussDB employs a fully parallel execution engine using MPP, SMP, vectorized execution, LLVM, and SIMD to maximize CPU utilization.

Full parallel execution architecture
Full parallel execution architecture
Database Architecturedistributed databaseSQL EngineGaussDB
Huawei Cloud Developer Alliance
Written by

Huawei Cloud Developer Alliance

The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.