Next‑Generation Cloud‑Native Data Warehouse: Architecture, Principles and Implementation
The article defines cloud‑native data warehouses as storage‑compute separated systems that elastically scale across clouds, outlines their key traits, describes a three‑layer architecture, compares Snowflake and OushuDB implementations, and illustrates a large bank’s migration to such a platform.
The article introduces the background and definition of cloud‑native data warehouses. With the evolution of analytical data warehouse architectures—from shared‑storage to MPP and SQL‑on‑Hadoop—traditional solutions can no longer meet the elasticity and cost requirements of cloud‑native environments. A cloud‑native data warehouse is defined as a loosely‑coupled, storage‑compute separated database system that can elastically scale in public, private or hybrid clouds.
It outlines the major characteristics of cloud‑native databases: independent scaling of compute and storage, high performance through linear scalability, cost‑effective resource usage, strong consistency and ACID support, multi‑cloud capability, and simplified operations.
The article then describes the typical cloud‑native data warehouse architecture, which consists of three layers: a cloud services layer for query parsing, optimization and metadata management; a query‑compute layer that can spin up isolated compute clusters for multi‑tenant workloads; and a storage layer that usually relies on object storage (e.g., S3) rather than HDFS.
Two concrete implementations are presented:
Snowflake : a three‑layer architecture with a cloud services layer, a compute layer that creates many small, isolated clusters, and an object‑storage‑based storage layer.
OushuDB : a cloud‑native data warehouse that also separates storage and compute, supports multiple storage back‑ends (object storage, HDFS), offers a plug‑in distributed table store (Magma) for mixed workloads, and claims 5‑10× performance improvement over Greenplum while maintaining full SQL compatibility.
The piece further discusses the evolution of data platform architectures, from traditional on‑premise data warehouses to data lakes and finally to integrated “lake‑warehouse” cloud‑native platforms where raw data is stored in a shared object store and processed by various compute clusters as needed.
An application case is described where a large state‑owned bank with dozens of petabytes of data migrates its workloads to a cloud‑native data warehouse to overcome high‑concurrency bottlenecks, data silos, and strict stability requirements.
Finally, the speaker’s biography is provided: Dr. Chang Lei, founder and CEO of Oushu Technology, former EMC senior researcher and director, with extensive experience in AI, big data, and database research, author of multiple papers in SIGMOD and holder of several international patents.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Tencent Cloud Developer
Official Tencent Cloud community account that brings together developers, shares practical tech insights, and fosters an influential tech exchange community.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
