How Elastic Multi‑Node Parallel Query Supercharges PolarDB MySQL Performance
This article explains the background, concept, advantages, applicable scenarios, technical implementation, and performance evaluation of Elastic Parallel Query (ePQ) in PolarDB MySQL, showing how multi‑node parallelism leverages idle CPU resources, adaptive scheduling, and cloud‑native architecture to accelerate large‑scale analytical workloads while maintaining HTAP capabilities.
Background
Parallel Query has been a core enterprise‑grade acceleration feature of PolarDB MySQL since its inception, tightly tied to the product’s cloud‑native compute‑storage separation that removes single‑node data size limits. Users naturally demand complex analytics on massive data while still requiring online (HTAP) capabilities, i.e., real‑time queries on the freshest data.
Operational monitoring of many cloud instances shows low CPU utilization; Elastic Parallel Query (ePQ) exploits idle CPU to reduce query latency, improve user experience, and increase cost‑effectiveness.
For a full history of Parallel Query see “The Past and Present of Parallel Query”. This article focuses on the next‑generation Elastic Multi‑Node Parallel Query.
Concept
Elastic Multi‑Node Parallel Query uses multiple compute nodes to finish a query. Existing Parallel Query runs multi‑threaded within a single RW/RO node, suitable for hundreds of GB. When data grows to TB‑scale, a single node’s CPU/IO becomes a bottleneck; ePQ distributes work across nodes, breaking the bottleneck and achieving global resource balance.
Unlike traditional MPP, PolarDB’s compute layer is one‑write‑multiple‑read; any node can access the full dataset, enabling elastic scaling without data migration.
Traditional MPP couples compute and storage; tasks run on the data‑holding node.
PolarDB’s cloud‑native separation lets any node see all data, allowing stateless tasks to run on any node.
ePQ not only gains multi‑node speedup but also dynamically adapts to topology and resource changes, adjusting parallelism on‑the‑fly to maintain high utilization.
Advantages
100% MySQL compatibility – syntax, types, and behavior are fully compatible; users cannot tell whether parallelism is enabled.
Extreme cost‑effectiveness – reuses the same data for in‑place analysis (no extra storage), leverages existing compute resources (no extra compute cost unless more nodes are added), uses adaptive scheduling to utilize idle resources, and adjusts parallelism automatically during scaling.
Minimal operational overhead – enable cross‑node parallelism with a simple switch and per‑node parallel degree; no code changes required.
Real‑time online analysis – sub‑millisecond replication lag ensures fresh data on RO nodes; higher parallelism yields faster results.
Flexible deployment – supports mixed workloads (OLTP and OLAP) by isolating subsets of nodes or sharing resources during off‑peak periods.
Applicable Scenarios
ePQ inherits all scenarios of Parallel Query and adds:
Massive data analysis where a single node’s CPU/Memory/IO limits are reached.
Resource‑load‑imbalanced environments where some RO nodes become hot.
Elastic compute situations where scaling out adds nodes to share the workload.
Mixed offline/online workloads that can share idle resources across clusters.
Technical Implementation
The previous article “The Past and Present of PolarDB Parallel Query” details the original design. This section highlights extensions for elasticity.
Distributed Optimizer
The multi‑stage optimizer enumerates all parallel plans and selects a global parallelism based on resource view and query cost, e.g., disabling parallelism when resources are insufficient or scaling up parallelism proportionally when cost thresholds are exceeded.
Parallel Execution Strategies
New strategies include parallel materialization of semijoin and derived‑table/CTE, improving TPC‑H benchmark performance.
Node‑to‑Node Interaction
Control channels extend MySQL’s command protocol; a leader‑to‑node “migrant leader” model reduces connections to 1:node. Data channels use TCP with port reuse, buffered queues, and full async design to avoid connection explosion and improve throughput.
Cross‑Node Parallel Scan
Within a node, round‑robin granule assignment balances load; across nodes, pre‑assign granule ranges to each node and use a shared counter locally, achieving a hybrid scheme.
Transaction Consistency
Read‑write transaction info is replicated via redo logs; RO nodes rebuild active transaction chains. Global LSN synchronization ensures workers wait until the required LSN is applied before constructing a consistent read view. SCC (Strong Consistency) further reduces latency.
Resource‑Based Adaptive Scheduling
Three strategies: LOCAL (single‑node), MULTI_NODES (linear scaling with node count), and AUTO (default) which adapts based on real‑time load, switching to multi‑node when needed.
Distributed Task Scheduler
Each compute node runs a coordinator; the scheduler pulls plans from a FIFO queue, requests compute‑resource budgets (CRB) from a global resource view, and assigns workers based on cache affinity and cross‑node data transfer minimization.
Performance Evaluation
Benchmarks include TPC‑H 100 GB (CPU‑bound) and 1 TB (IO‑bound) runs, single‑node vs. multi‑node comparisons, scale‑out experiments, and mixed OLTP/OLAP workloads. Results show consistent latency reductions, linear scalability up to 256 DoP, and rapid adaptation (1‑2 s) to workload changes.
set optimizer_switch="hash_join_cost_based=off";</code>
<code>set cbqt_enabled=off;Summary and Outlook
Elastic Multi‑Node Parallel Query (ePQ) provides powerful real‑time acceleration for large‑scale analytical queries on PolarDB MySQL, while also enabling dynamic resource balancing and cost‑effective scaling. Future work includes enhancing the MySQL optimizer for better join trees, improving statistics, adopting RDMA for data channels, supporting partition‑wise joins, federated queries, and adaptive execution mechanisms.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
