How TDSQL‑PG Achieves Real‑Time HTAP: Architecture, Storage, and Optimization Insights
This article presents a comprehensive overview of TDSQL‑PG's HTAP capabilities, detailing the evolution of its storage, compute, transaction management, hybrid row‑column storage, cost‑based optimizer, vectorized execution, and resource isolation strategies for handling mixed OLTP and OLAP workloads.
HTAP Overview
Hybrid Transactional/Analytical Processing (HTAP) combines high‑throughput OLTP workloads with large‑scale OLAP queries in a single database system. Traditional OLTP uses row‑oriented storage for low‑latency transactions, while OLAP benefits from column‑oriented storage for efficient scans on petabyte‑scale data. Modern applications require both high concurrency and massive analytical capability, motivating integrated HTAP solutions.
TDSQL‑PG Architecture
TDSQL‑PG is a massively parallel processing (MPP) distributed relational database designed for massive online real‑time data. Its core components are:
Coordinator Nodes (CN) : Multiple CNs accept client connections and coordinate request routing, providing high concurrency for both TP (transaction processing) and AP (analytical processing) workloads.
Data Nodes (DN) : Storage‑compute nodes form an MPP sharing layer; the system can scale to thousands of DNs for parallel execution.
Global Transaction Manager (GTM) : A distributed transaction coordinator that eliminates a single‑point bottleneck by using a monotonic logical timestamp and delegating coordination to CNs/DNs.
Transaction Management and GTM Redesign
The original GTM relied on a global snapshot, causing network overhead and a single‑node bottleneck. TDSQL‑PG replaced this with a monotonic logical timestamp and moved transaction coordination to CN/DN nodes. This redesign enables up to 12 million QPS on a single TS85 server while supporting hybrid row‑column transactions.
Hybrid Row‑Column Storage Model
TDSQL‑PG provides a transparent hybrid storage format. When a table is created with the hybrid_storage=on switch, the system internally maintains:
A row‑store for OLTP‑type inserts/updates.
A column‑store for bulk analytical loads.
A background process (Stash Merge) continuously merges row‑store data into the column‑store, guaranteeing a single source of truth without data duplication.
Cost‑Based Optimizer and Vectorized Execution
The optimizer uses dynamic programming (CBO/RBO) to explore execution plans across multiple levels, estimating costs for table scans, index scans, join orders, and bitmap‑index operations. Separate cost models exist for row‑store and column‑store paths. Vectorized execution is applied selectively:
Analytical operators (e.g., column‑store scans, aggregations) run fully vectorized.
Transactional operators may retain row‑wise processing when it yields lower latency.
Resource Isolation and Multi‑Tenant Support
Resource groups partition CPU, memory, and concurrency limits per tenant or workload class. The optimizer dynamically estimates per‑operator memory usage and enforces group limits. Physical isolation is achieved via primary‑standby plane replication: read‑only traffic can be served from standby nodes while preserving strong consistency.
Future Exploration
Ongoing work focuses on:
Dynamic concurrency control that adapts to diurnal workload shifts (e.g., TP‑heavy daytime, AP‑heavy nighttime).
Fine‑grained memory management with automatic per‑operator budgeting.
Elastic resource partitioning across resource groups.
Automated parameter tuning based on workload characteristics.
The goal is a fully transparent, single‑system HTAP platform that eliminates the need for separate streaming replication pipelines.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
