How PolarDB MySQL’s In‑Memory Column Index Supercharges HTAP Performance
This article explains how PolarDB MySQL introduces an In‑Memory Column Index (IMCI) to combine row‑store OLTP strength with column‑store analytical speed, detailing the architectural innovations, optimizer decisions, data organization, resource isolation strategies, and benchmark results that show IMCI achieving tens to hundreds of times faster query execution than native MySQL and matching ClickHouse performance.
Analytical databases are booming in both capital markets and the tech community, and cloud‑native advances make rebuilding analytical database stacks on the cloud a huge market opportunity.
MySQL ecosystem HTAP solutions
MySQL is primarily an OLTP system; its open‑source community focuses on transaction throughput, leaving analytical capabilities lagging. Users increasingly need real‑time analytics, prompting three HTAP approaches.
1. Separate TP and AP systems
Deploy two databases (one for OLTP, one for OLAP) and synchronize data in real time, but this introduces synchronization delay and operational complexity.
2. Divergent Design with multi‑replica
Use a replica that stores data in columnar format (e.g., TiDB/TiFlash) for analytical workloads, requiring migration to a NewSQL system and incurring compatibility challenges.
3. Integrated row‑column hybrid storage
Commercial DBMS such as Oracle, SQL Server, and DB2 adopt a row‑column hybrid storage model; PolarDB follows this proven approach.
Evolution of PolarDB MySQL AP capabilities
PolarDB provides up to 100 TB per instance and a one‑write‑many‑read architecture, allowing read‑only (RO) nodes to run complex analytical queries without affecting TP load.
Why MySQL struggles in AP scenarios
The Volcano iterator model relies on deep function nesting and virtual calls, hurting CPU pipeline and cache efficiency.
Execution is largely single‑threaded, limiting multi‑core utilization.
Row‑store format forces reading unnecessary columns, causing I/O waste and memory bandwidth pressure.
Parallel query breakthrough
PolarDB’s Parallel Query automatically launches parallel execution when data volume exceeds a threshold, partitioning data across threads, aggregating results, and achieving exponential latency reduction.
Why column‑store is needed
Columnar layout improves I/O efficiency through compression, data skipping, and column pruning.
Columns stored contiguously enhance CPU cache friendliness and enable SIMD vectorization for per‑core speedup.
PolarDB In‑Memory Column Index (IMCI)
IMCI adds columnar storage and in‑memory computation to PolarDB, allowing a single database to serve both TP and AP workloads while preserving PolarDB’s strong OLTP performance.
Key technical innovations
Support for columnar indexes in InnoDB as secondary indexes; indexes can reside fully in memory or be persisted to shared storage.
A column‑oriented execution engine processes data in 4 KB batches, leverages SIMD, and parallelizes key operators, delivering multi‑order‑magnitude speedup over MySQL’s row engine.
A cost‑based optimizer evaluates row, parallel, and column plans, selecting the lowest‑cost execution path.
RO nodes can host column indexes for analytics, using all available CPU and memory without impacting TP resources.
Hybrid optimizer
The optimizer uses a whitelist and cost model to decide whether a query runs on the row engine, Parallel Query, or IMCI, ensuring compatibility and performance.
Column index as secondary index
Implemented as a secondary index, IMCI reuses InnoDB’s transaction handling, redo log, and replication mechanisms, allowing seamless fallback to row storage if needed.
Data organization
Data is packed into RowGroups and DataPacks; writes are append‑only, and background compaction reclaims space. Statistics (min, max, sum, null count, row count) are stored per DataPack to enable coarse‑grained pruning.
TP/AP resource isolation
Three deployment patterns are supported: (1) mixed storage on the RW node for light AP workloads, (2) a dedicated AP RO node with isolated CPU and memory, and (3) a standby node with separate shared storage, achieving CPU, memory, and I/O isolation.
Performance evaluation
In a TPC‑H benchmark (100 GB, 22 queries) IMCI outperforms native MySQL by tens to hundreds of times (up to ~400× on Q6) and matches ClickHouse’s performance, demonstrating its competitiveness as an analytical engine.
Future work
Automated index recommendation based on query patterns.
Standalone column tables and OSS storage to further reduce costs.
Fine‑grained row‑column hybrid execution where parts of a plan run on row storage and parts on column storage.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Alibaba Cloud Developer
Alibaba's official tech channel, featuring all of its technology innovations.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
