OushuDB: A Cloud‑Native Real‑Time Lakehouse Database – Architecture, Evolution and Practice
This article introduces OushuDB, a cloud‑native real‑time lakehouse database, tracing the evolution of cloud‑native lakehouse architectures, detailing OushuDB’s multi‑engine, multi‑storage design, and sharing practical insights on compute‑storage separation, high‑availability, and integration with Hadoop, Hive and Hudi.
OushuDB is a cloud‑native database designed to realize a real‑time, multi‑engine lakehouse system. The article first outlines the historical milestones of cloud‑native real‑time lakehouses, from the early OLTP‑focused database era, through distributed databases, big‑data platforms, lake‑warehouse separation, to the emergence of cloud‑native integrated architectures.
The cloud‑native lakehouse architecture is described as a unified platform that combines data lake storage and data warehouse processing, supporting both real‑time and batch workloads, multi‑modal storage (object storage, HDFS, Magma), and multiple compute engines sharing a single data copy.
Key features of OushuDB include:
Full real‑time data ingestion and integration.
Unified metadata management.
Native support for multiple storage backends (OSS, COS, HDFS, Magma).
Shared compute engines to avoid data silos.
Elastic resource management with on‑demand scaling.
High‑performance, fully real‑time analytics.
The system’s architecture is layered: a client layer (JDBC/ODBC), a stateless master‑node layer for query compilation and routing, a virtual compute cluster layer with stateless compute nodes, and a virtual storage cluster layer that abstracts object storage, HDFS and Magma via a virtual storage sub‑cluster (VSC) mechanism.
Practices for building applications on OushuDB focus on compute‑storage separation, dynamic scheduling close to data, managing the mapping between compute and storage layers, handling node failures, and ensuring read‑write consistency (including ACID transactions via Raft and multi‑replica metadata).
Additional capabilities include native support for the Hudi table format (enabling multi‑engine data sharing), seamless Hive metastore integration for metadata sharing and column‑level security via Ranger, and compatibility with Hive partitioned tables.
Overall, OushuDB demonstrates how cloud‑native technologies, distributed storage, and modern compute frameworks can be combined to build a scalable, high‑availability, real‑time lakehouse solution.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
DataFunSummit
Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
