How OceanBase Achieves Real‑Time HTAP: Inside Its Unified Storage and Vectorized Engine
This article details OceanBase's evolution from a distributed OLTP system to a unified HTAP database, covering its cost‑based optimizer, vectorized execution, integrated row‑column storage, bypass import, materialized views, external tables, full‑text search, and real‑world use cases for real‑time analytics.
Background
OceanBase was originally built for high‑concurrency, strong‑consistency transactional workloads. To meet emerging fine‑grained, real‑time analytics, risk‑control, interactive analysis and AI/RAG scenarios, it has evolved into a unified HTAP engine that combines OLTP and OLAP capabilities within a single distributed system.
SQL Layer – Cost‑Based Optimizer (CBO) and Adaptive Execution
The optimizer gathers comprehensive statistics and applies a large set of rewrite rules, including many OceanBase‑specific transformations. Rewrite decisions are cost‑driven, and non‑universal rewrites are applied only when they reduce estimated cost. Adaptive plan caching and a plan‑management module (SPM) protect against regressions caused by upgrades or data skew. AutoDOP automatically determines whether to enable parallel execution and selects an appropriate degree of parallelism for analytical queries without impacting transactional latency.
Vectorized Execution Engine
Since version 3.2, data is stored in an in‑memory columnar layout and processed by a vectorized engine. Version 4.3 refines the memory format for SIMD friendliness, rewrites operators to minimise branch mispredictions and improve cache utilisation, achieving world‑class analytical performance. The PL engine uses JIT compilation with multi‑level caching (memory and disk) to deliver high‑performance procedural execution.
Unified Row‑Column Storage
Row storage uses a PAX format: each micro‑block (4 KB–64 KB) stores columns internally, enabling better compression and predicate push‑down than classic row stores. For large‑scale analytical workloads, a true column store is built on top of the LSM‑Tree base during the lowest‑level compaction. This column store supports multiple encodings (PFOR, Delta, Dictionary, RLE) combined with zstd or lz4 compression, and maintains min/max/sum/count skip‑indexes at micro‑block, macro‑block and SSTable levels. The design preserves full DML, CDC, and transaction semantics while delivering real‑time analytical performance.
Bypass Import for Massive Data Loads
The bypass import path allows parallel DML, LOAD DATA, OBLoader and Table API to write directly into the columnar baseline, skipping the MemTable and flush stages. This yields order‑of‑magnitude speedups compared with conventional parallel DML.
Materialized Views
Materialized view support includes full‑refresh, incremental refresh, real‑time views, nested views and outer‑join incremental refresh. These features accelerate complex queries and simplify data‑warehouse architectures by allowing query rewrite to use pre‑computed results.
External Table Capabilities
OceanBase provides Oracle‑compatible DBLink for cross‑cluster access and external tables for CSV, Parquet, ORC and other formats. Integration with S3, OSS, HDFS, ODPS, Hive Metastore and Iceberg enables predicate push‑down and caching, improving query efficiency on external data sources.
Full‑Text Search and Complex Data Types
Full‑text indexing supports customizable tokenizers, and the system also offers vector, spatial and scalar indexes for high‑quality retrieval in RAG applications. Additional OLAP‑oriented types such as Bitmap, Array and Map are available, along with dynamic partition management and heap tables to ease migration from traditional OLAP systems.
Key Performance Highlights
V3.2 introduced in‑memory columnar storage and basic vectorized execution.
V4.2 added general‑purpose materialized views with incremental and real‑time refresh.
V4.3 refined SIMD‑friendly memory layout and operator implementation.
V4.4 (2025) incorporated column‑store replicas, advanced encodings, and further performance optimisations.
Typical Use Cases
Real‑time analytics are deployed in finance, government and internet domains for low‑latency risk‑control algorithms, operational dashboards and AI‑driven analytics. The unified architecture enables automatic tuning, storage‑compute separation and AI‑enhanced features, positioning OceanBase as a foundational data platform for the AI era.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
ITPUB
Official ITPUB account sharing technical insights, community news, and exciting events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
