Databases 11 min read

Parallel Query Capabilities in GaiaDB 4.0: HTAP and Cloud‑Native Database Performance

GaiaDB 4.0 introduces cloud‑native parallel query execution, allowing the same MySQL‑compatible cluster to handle both OLTP and OLAP workloads with up to fourteen‑fold speedup on complex queries, configurable via global parameters or query hints, and adds columnar indexes for further analytical acceleration.

Baidu Geek Talk
Baidu Geek Talk
Baidu Geek Talk
Parallel Query Capabilities in GaiaDB 4.0: HTAP and Cloud‑Native Database Performance

In enterprises there are two typical data‑processing scenarios: OLTP (online transaction processing) for transaction systems and OLAP (online analytical processing) for business reporting.

OLTP databases excel at insert, update, delete and low‑volume queries, emphasizing real‑time response, high throughput and transactional guarantees. OLAP focuses on large‑scale, complex queries, requiring scalability and heavy computation.

To satisfy both scenarios, companies usually adopt either an OLTP + OLAP combination or a distributed HTAP database solution.

The OLTP + OLAP combination runs each workload on the database best suited for it and synchronizes data asynchronously via ETL. This approach handles complex business needs but demands strong IT capabilities.

The HTAP approach uses a distributed database with zero‑ETL, hiding the complexity of data synchronization between OLTP and OLAP and simplifying the architecture. It works well for "heavy‑TP, light‑AP" scenarios, though it requires comprehensive distributed transformation.

Beyond these two options, many enterprises still rely on MySQL (a TP‑type database) to handle AP workloads because they cannot find a lower‑cost, lightweight solution.

GaiaDB 4.0 adds parallel query capability

GaiaDB is a cloud‑native, MySQL‑compatible database that offers higher scalability and performance than open‑source MySQL. Prior to version 4.0, GaiaDB could not efficiently support complex analytical queries; a single complex query could take dozens of seconds.

Version 4.0 introduces parallel query for real‑time analytical workloads, allowing the same cluster to serve both transactional and analytical needs without any code changes.

Performance gains stem from better utilization of multi‑core CPUs. In MySQL, a single query runs on one thread (one CPU core), which limits performance for I/O‑intensive or complex queries. Parallel execution distributes work across multiple cores, dramatically improving throughput.

For example, GaiaDB partitions a table, dispatches each partition to a separate CPU core, computes partial count(*) results, and finally aggregates them. This parallelism can improve query performance by one to two orders of magnitude compared with single‑threaded execution.

Technical implementation

GaiaDB parallelizes operators that can be executed concurrently (e.g., scan, gather). During query execution, data is sharded and multiple worker threads compute their portions, after which results are merged and returned to the client. An additional stream operator enables data exchange between workers to ensure correctness.

The scan operator is split into multiple ParallelTblScanIterator contexts, each processed by a separate thread, unlike the single‑threaded TableScanIterator .

The gather operator transforms a single‑threaded query into a one‑to‑many thread model: it creates new threads, copies necessary variables, and aggregates results (e.g., sums) from all workers before returning the final result.

GaiaDB also supports parallelization of other operators:

Parallel Filter (WHERE/HAVING)

Parallel Scan (Projection)

Parallel Join (HashJoin, NestLoopJoin, SemiJoin)

Parallel Aggregation (SUM, AVG, COUNT, BIT_AND, BIT_OR, BIT_XOR)

Parallel Sort (ORDER BY)

Parallel Group‑by

Other parallel operators such as LIMIT/OFFSET and UNION

Performance testing

In TPC‑H benchmark tests on a 32‑core, 2‑NUMA‑node Xeon Silver 4110 machine with 128 GB RAM and 100 GB of data, GaiaDB achieved up to 14× speedup for single‑table complex queries and an average of more than 8× improvement compared with single‑threaded MySQL.

CPU utilization reached 100 % on 32 cores.

Enabling parallel query in GaiaDB

There are two ways to turn on parallel execution:

Modify global cluster parameters: SET force_parallel_execute=ON to enable parallel query; parallel_default_dop=4 to set the default degree of parallelism (default is 4 threads); parallel_cost_threshold=1000 to define the cost threshold that triggers parallel execution (default 1000).

Use a query hint, e.g., SELECT /*+ PQ(8) */ … FROM … , which forces parallelism with the specified degree.

In addition to parallel execution, GaiaDB 4.0 introduces columnar indexes, which further accelerate query performance by storing selected columns in a columnar format.

performanceSQLHTAPcloud-native databaseGaiaDBParallel Query
Baidu Geek Talk
Written by

Baidu Geek Talk

Follow us to discover more Baidu tech insights.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.