How HTAP and DRDS HTAP Enable Real‑Time OLTP/OLAP Integration
This article explains the concepts of OLTP, OLAP and HTAP, describes the DRDS HTAP architecture—including its engine and storage layers, Fireworks Spark‑based engine, optimizer stages, and streaming capabilities—and demonstrates cross‑database MPP queries and streaming joins while outlining suitable use cases and limitations.
Liang Chenghui, a technical expert from Alibaba's Database Division and lead of the DRDS HTAP project, presented a detailed technical overview of HTAP and the DRDS HTAP solution at the dbaplus Data Architecture and Optimization Salon in Shanghai.
1. What is HTAP
Before introducing HTAP, the article reviews OLTP and OLAP. OLTP workloads require high concurrency, low latency (typically sub‑millisecond), strong consistency, and simple point‑lookup queries. OLAP workloads involve complex, large‑scale analytical queries (joins, group‑by, sub‑queries) that tolerate higher latency and lower concurrency. Traditionally, OLTP and OLAP systems are separate, with data replicated from OLTP to OLAP via daily sync pipelines, which introduces operational overhead, data quality risks, increased maintenance cost, and latency.
2. HTAP Overview
HTAP (Hybrid Transactional/Analytical Processing) combines OLTP and OLAP capabilities within a single database system, providing real‑time data freshness and eliminating the need for separate sync pipelines. It offers transaction processing (TP) alongside analytical processing (AP) with millisecond‑level data latency.
TP/AP data freshness: MySQL master‑slave replication can achieve sub‑second delay, making the data appear identical for both transactional and analytical queries.
TP/AP stability & HA: Physical or logical isolation (separate clusters, standby nodes) ensures high availability for both workloads.
Cross‑database joins: The query engine can access multiple MySQL‑compatible storage nodes.
Complex SQL handling: Integrated optimizer, MPP engine, and streaming engine accelerate heavy analytical queries.
3. DRDS HTAP Architecture and Key Technologies
The architecture consists of two layers:
Storage layer (orange): Distributed MySQL instances (RDS) with primary‑replica HA.
Engine layer (gray): Stateless DRDS Server nodes handling MySQL protocol, query optimization, and TP execution operators.
TP queries run entirely in the TP engine. OLAP queries are dispatched to the Fireworks engine, a Spark‑based DAG execution engine that supports both MPP and streaming computations. Workers pull data from MySQL replicas, enabling parallel processing.
The optimizer works in three stages:
PreOptimizer: Logical rewrites such as sub‑query unnesting and constant folding.
Optimizer: Predicate inference, operator push‑down, column pruning, join reordering, etc.
PostOptimizer: DRDS‑specific sharding calculations and partition‑aware optimizations for distributed execution.
Operators are pushed down to storage whenever possible, reducing network traffic and CPU load. For queries that span multiple shards or databases, the optimizer automatically falls back to MPP execution in Fireworks.
The streaming engine builds on Spark Streaming, adding a RocksDB state store for fault‑tolerant state management and supporting MySQL‑compatible timestamp columns as input streams. Users can write standard streaming‑SQL, including streaming‑streaming joins, without managing external streaming clusters.
4. DRDS HTAP Demonstrations
Cross‑Database MPP Query
A simulated e‑commerce scenario shows three logical databases (transaction, user, product) each backed by separate MySQL instances. DRDS abstracts them as logical schemas, allowing a single SQL statement to join across all three without manual data migration.
The demo logs into each RDS, then into DRDS, and executes the cross‑schema join, producing a result set that would otherwise require a data warehouse.
Streaming Join
The streaming demo creates two input streams (T1, T2) and performs a join to produce a wide table in real time. The workflow uses CREATE STREAM statements, inserts data into the streams, and writes the join result into a persistent table.
After creating the streams and executing the join, the resulting table can be queried instantly, demonstrating low‑latency analytics without an intermediate batch step.
5. Use Cases and Limitations
Suitable for low‑concurrency, real‑time analytical scenarios that involve multiple business databases or tables.
Ideal for time‑based streaming joins (e.g., fact‑table joins, dimension‑fact joins).
Not optimal for high‑concurrency OLTP workloads, full‑text search, or ad‑hoc analytical queries that require extreme flexibility.
DRDS HTAP is currently offered as an analytical read‑only instance in public cloud, with plans to extend support for more complex online and analytical workloads.
Overall, DRDS HTAP provides a unified platform that delivers millisecond‑level data freshness, high availability, and scalable query processing for mixed transactional and analytical workloads.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
dbaplus Community
Enterprise-level professional community for Database, BigData, and AIOps. Daily original articles, weekly online tech talks, monthly offline salons, and quarterly XCOPS&DAMS conferences—delivered by industry experts.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
