How Does GaussDB Execute SQL? Inside Its Engine and Optimization
This article explains GaussDB's system architecture, the roles of GTM, CN, and DN, the SQL and storage engines, parsing stages, rule‑based and cost‑based optimizers (including AI‑based techniques), distributed query plans, execution operators, and the parallel execution framework that together enable high‑performance SQL processing in a cloud‑native distributed database.
GaussDB System Architecture
GaussDB can be deployed in centralized or distributed mode on Huawei Cloud, as shown in the architecture diagrams.
Key Components
GTM : Global Transaction Manager that generates and maintains global transaction IDs, snapshots, timestamps, and sequence information.
CN : Coordinator Node that receives client requests, decomposes tasks, and schedules parallel execution on Data Nodes.
DN : Data Node that stores business data (row, column, or hybrid), executes query tasks, and returns results to the CN.
Logical Architecture
GaussDB consists of a SQL engine and a storage engine. The SQL engine handles parsing, optimization, and execution, while the storage engine manages I/O, buffering, and transaction logging.
SQL Engine Layers
Parser : Performs lexical analysis, syntax analysis (producing an AST), and semantic analysis (producing a query tree).
Optimizer : Includes Rule‑Based Optimization (RBO) and Cost‑Based Optimization (CBO). GaussDB also explores AI‑Based Optimization (ABO) to adapt cost models using machine learning.
Key RBO Techniques : Predicate pushdown, redundant operation elimination, subquery lift, outer‑to‑inner conversion, join reordering, and inequality‑join elimination.
Distributed Query Optimization
GaussDB supports four distributed execution plans:
CN Lightweight : Directly sends the statement to a single DN (LIGHT_PROXY).
Fast Query Shipping (FQS) : Generates a RemoteQuery plan and pushes it to DN(s) for execution, aggregating results on the CN.
STREAM : Generates a plan with stream operators that enable data exchange between DNs.
Remote‑Query : Splits the query into remote parts executed on DNs and a final part executed on the CN.
Stream Operators
Execution Operators
Scan Plan Node : Reads data from storage (e.g., SeqScan, IndexScan).
Control Plan Node : Handles flow control such as LIMIT, UNION.
Materialize Plan Node : Buffers intermediate results for operators like AGG and SORT.
Join Plan Node : Implements MergeJoin, NestLoop, HashJoin.
Other Operators : Various auxiliary operators.
Parallel Execution Architecture
GaussDB employs a fully parallel execution engine using MPP, SMP, vectorized execution, LLVM, and SIMD to maximize CPU utilization.
Huawei Cloud Developer Alliance
The Huawei Cloud Developer Alliance creates a tech sharing platform for developers and partners, gathering Huawei Cloud product knowledge, event updates, expert talks, and more. Together we continuously innovate to build the cloud foundation of an intelligent world.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
