Databases 16 min read

How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

This article summarizes Wang Chuanting’s DTCC2022 talk on Huawei Cloud GaussDB(DWS) 3.0, detailing its cloud‑native architecture, layered elasticity, lake‑warehouse integration, performance acceleration techniques, and how it seamlessly couples data‑processing pipelines with AI workloads for modern, real‑time analytics.

ITPUB
ITPUB
ITPUB
How Huawei’s GaussDB(DWS) 3.0 Redefines Cloud‑Native Data Warehousing

Background and Trends

Modern data‑analysis requirements have outgrown traditional BI, driving a shift toward cloud‑native, lake‑warehouse‑integrated, and intelligence‑fused data stacks. GaussDB(DWS) 3.0 is Huawei Cloud’s response, offering a next‑generation, cloud‑native data‑warehouse platform.

GaussDB(DWS) Overview

GaussDB(DWS) 3.0 delivers high concurrency, high performance, and interactive query experiences within a lake‑warehouse architecture. It enables seamless, efficient collaboration between data‑production and AI pipelines, leveraging a resource‑pooled, compute‑storage‑separated design tightly integrated with the cloud.

Evolution History

Huawei began OLAP research in 2011, launched its first product in 2014, and has since amassed roughly 2,000 enterprise customers. Early versions focused on columnar storage and vectorized execution for OLAP workloads. Subsequent releases added large‑cluster communication, dynamic load management, LVM, multi‑tenant support, SQL‑on‑HD, SQL‑on‑OBS, backup, and disaster‑recovery capabilities. In 2020, development of DWS 3.0 began, emphasizing cloud‑native principles.

Serverless Cloud‑Native Architecture

GaussDB(DWS) adopts a three‑layer separation of compute, storage, and management:

Compute Layer : Logical clusters (VWs) can be independently scaled, support multiple deployment modes (public cloud, hybrid, on‑premise), and run on VMs, bare metal, or physical servers.

Storage Layer : Supports open formats (ORC, Parquet, Hudi) and a proprietary format with richer indexing. Data is stored in Huawei OBS buckets, enabling low‑cost, elastic storage.

Management Layer : Provides query optimization, access control, global transaction handling, and Hive MetaStore integration for direct table reads.

Architecture diagram
Architecture diagram

Key Technical Characteristics

Layered Elasticity : Three‑tier compute‑storage separation enables serverless‑style scaling and zero‑copy data sharing.

Horizontal Integration : Supports a wide range of programming languages, drivers, and BI tools; provides lake‑warehouse fusion for end‑to‑end services.

Intelligence Fusion : Internally offers automatic load analysis and tuning; externally integrates with AI pipelines for streamlined model training and inference.

Elasticity Advantages

Compute elasticity allows rapid scaling of VWs without data reshuffling. Storage elasticity relies on OBS, reducing costs while maintaining high performance. Two sharing modes are provided:

Near‑real‑time sharing : Incremental data is written to OBS by one VW and read by another, incurring minimal latency.

Real‑time sharing (Oracle‑RAC‑like) : VW2 reads the in‑memory state of VW1 directly, achieving immediate data visibility.

Elasticity diagram
Elasticity diagram

Lake‑Warehouse Integration

External schemas replace cumbersome external tables, allowing direct schema.table access to Hive and Spark metadata. GaussDB(DWS) can read and write data in OBS using ORC, Parquet, or Hudi formats, enabling seamless analytics across data‑lake and warehouse environments.

External schema illustration
External schema illustration

Performance Acceleration

Three main techniques boost performance after compute‑storage separation:

Caching : Local VW‑level caches and a network‑wide cache service pre‑warm data, reducing latency during VW start‑up.

Operator Push‑Down : Simple filter predicates are pushed to OBS, filtering data before it reaches the compute layer.

IO Scheduling : OBS’s high bandwidth compensates for its latency; a priority‑based scheduler allocates IO resources fairly among concurrent queries.

IO scheduling diagram
IO scheduling diagram

Data & AI Production Line Coupling

Two scenarios enable fast data flow to AI workloads: (1) batch data is stored in OBS for shared access; (2) real‑time queries use optimized plugins to retrieve data from DWS with minimal latency.

Data‑AI pipeline illustration
Data‑AI pipeline illustration

Conclusion

GaussDB(DWS) 3.0 exemplifies a cloud‑native, fully elastic, lake‑warehouse‑integrated data‑warehouse that delivers high performance, strong security, and seamless AI integration, positioning it as a strategic platform for enterprises pursuing modern, real‑time analytics and digital transformation.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Sign in to view source
Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactadmin@besthub.devand we will review it promptly.

Cloud NativePerformance OptimizationData WarehouseAI integrationelasticityLakehouseGaussDB
ITPUB
Written by

ITPUB

Official ITPUB account sharing technical insights, community news, and exciting events.

0 followers
Reader feedback

How this landed with the community

Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.