Databases 17 min read

How TDSQL‑C Accelerates Parallel Queries in Cloud‑Native Databases

This transcript details the design, implementation, and performance evaluation of TDSQL‑C's parallel query feature, covering product background, workload challenges, execution plans, Amdahl's law analysis, task scheduling, data partitioning, and future enhancements for cloud‑native database systems.

ITPUB

Dec 29, 2022

How TDSQL‑C Accelerates Parallel Queries in Cloud‑Native Databases

Reference Materials

TDSQL‑C product page: https://cloud.tencent.com/product/cynosdb

Column‑store vs. row‑store study: Abadi et al., 2008

TDSQL‑C parallel query user manual: https://cloud.tencent.com/document/product/1003/81872

TPC‑H benchmark specification: https://www.tpc.org/tpch/default5.asp

Amdahl's law: https://en.wikipedia.org/wiki/Amdahl%27s_law

SQL computation model: Astrahan & Chamberlin, 1975

Volcano parallel query system: Graefe, 1994

Immediate‑trigger parallel scheduling: Graefe, 1990

HASH partitioning: Kitsuregawa et al., 1989

Decomposable aggregate functions: https://en.wikipedia.org/wiki/Aggregate_function

Control vs. data flow in parallel DBMS: Teeuw & Blanken, 1993

CSP interaction model: https://en.wikipedia.org/wiki/Communicating_sequential_processes

Product Overview

TDSQL‑C is a cloud‑native relational database built on physical log replication and shared storage. It provides near‑100% MySQL compatibility, high‑availability clustering, and elastic scaling. The architecture separates compute and storage, allowing cost‑effective expansion of CPU cores without moving data.

Parallel Query Feature

The parallel query engine introduces *query‑level concurrency*: a single SQL statement is split into multiple tasks that run on different CPU cores. Users first allocate a core budget (e.g., 4 cores) for a session; the optimizer then generates a parallel execution plan containing parallel‑capable operators and special exchange operators for data movement between threads.

Performance was measured with the TPC‑H benchmark (22 queries representing typical analytical workloads). Speed‑up varies per query and follows Amdahl's law: overall acceleration = 1 / ((1‑p) + p/s), where p is the parallelizable fraction and s is the local speed‑up. Most queries showed noticeable gains, while queries with a small p (e.g., heavy single‑threaded aggregation) exhibited limited improvement.

Parallelism Principles

Parallel execution is achieved by decomposing both tasks and data :

**Task decomposition** – the optimizer inserts iterator boundaries (e.g., after a filter or join) and treats each sub‑iterator as an independent task.

**Data decomposition** – input tables are partitioned (hash or range) so that each task works on a disjoint subset.

Example: a SELECT SUM(col) FROM tbl aggregation. In a serial plan, each row updates a global hash table, then a final pass emits the sum. In the parallel plan, the table is split into two halves, each half builds a local hash table in parallel, and a final merge operator combines the two partial sums. Because the final merge is sequential, the speed‑up is less than 2×.

The SQL computation model consists of three layers:

Relational operators (filters, joins, scans)

Statistical functions (e.g., SUM, COUNT) that require a full‑set scan and therefore introduce a pipeline break.

Expression evaluation (row‑wise arithmetic, user‑defined functions)

Hash‑based partitioning enables parallel joins: the build side is hashed on the join key, and each probe thread only receives matching partitions, reducing data shuffling.

Scheduling Framework

A dedicated scheduling framework coordinates task execution:

The optimizer produces a task graph that captures dependencies between parallel tasks.

A coordinator thread (which may be a dedicated thread or the user thread) dispatches tasks to worker threads.

Worker threads execute their assigned tasks, report status, and participate in error handling integrated with MySQL’s exception mechanism.

Task representation follows a “task shape” model that reuses the optimizer‑generated physical plan. The shape is intercepted, cached, and reused for subsequent executions, avoiding redundant plan construction. When a parallel‑capable operator is detected, the framework:

Defines task boundaries at iterator edges.

Partitions input data according to the operator’s requirements.

Allocates the requested core resources.

Data Partitioning and Exchange

InnoDB stores tables as B+‑trees. The partitioner recursively splits index ranges until each partition contains enough rows for a worker thread. If a partition is still too large, it is further subdivided at a lower tree level. This logic is based on the existing community code and extended to support both forward and reverse scans and multiple index types.

Data exchange between upstream and downstream tasks is performed by *bridge* operators. Each bridge acts as a network node that receives rows from its producer thread, buffers them, and forwards them to the consumer thread(s). This design decouples the execution of independent tasks while preserving the logical data flow of the original plan.

Future Directions

Planned enhancements include:

Extending parallel query coverage to more operator types (e.g., window functions, sub‑queries).

Deep performance tuning of partitioning heuristics and exchange buffering.

Hybrid row‑column scheduling that combines vectorized columnar execution with row‑based processing for additive speed‑ups.

Improving visibility into the execution engine (trace logs, visual plan explorers) to aid debugging and performance tuning.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Task scheduling TDSQL-C Data Partitioning cloud-native database Parallel Query Amdahl's Law SQL Performance

Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.