Databases 11 min read

Databend: Cloud‑Native Modern Data Warehouse Architecture and Features

This article explains how Databend, a cloud‑native data warehouse, addresses modern OLAP requirements through storage‑compute separation, elastic scaling, multi‑cloud support, advanced query planning, and serverless‑ready design, contrasting it with traditional data warehouse limitations.

DataFunSummit

Oct 17, 2021

Databend: Cloud‑Native Modern Data Warehouse Architecture and Features

Databend, whose name derives from the relativistic concept of "Time Bend," aims to let users rethink data to extract greater value.

The article outlines three main topics: (1) why traditional data warehouse architectures no longer meet current needs, (2) the advantages of modern data warehouse designs, and (3) Databend's cloud‑native architecture.

Traditional data warehouses integrate compute and storage on the same machines, leading to costly scaling, low elasticity, and performance degradation as data grows.

Modern data warehouses require five key capabilities: no hardware management, no software configuration, no resource bottlenecks, second‑level elastic scaling, and pay‑for‑use only when resources are active.

Databend satisfies these needs with a cloud‑native design that separates storage, compute, and metadata services, uses shared cloud storage (e.g., S3, Azure Blob), and provides elastic, multi‑tenant meta services.

The architecture consists of four layers: data ingestion (SQL‑compatible), Meta Service (metadata and multi‑tenant key‑value store), compute layer (independent SQL parsing, planning, and execution nodes), and storage layer (cloud object storage with columnar Parquet files, MinMax and sparse indexes).

Query planning in Databend generates logical plans, optimizes them, and produces physical pipeline plans represented as directed acyclic graphs. Example execution plan code:

explain pipeline SELECT avg(age) FROM class WHERE age > 13 GROUP BY city

The physical pipeline creates parallel execution nodes, enabling vectorized processing and work‑stealing task scheduling to keep CPUs busy.

Databend also employs Arrow Flight RPC for efficient data transfer, avoiding serialization overhead, and supports automatic data clustering to accelerate hot‑data queries.

Future plans include early‑stage Alpha releases, continued performance and elasticity improvements, and a move toward serverless operation where compute resources are provisioned on demand and billed per use.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native Query Planning Elastic Scaling Databend

Written by

DataFunSummit

Official account of the DataFun community, dedicated to sharing big data and AI industry summit news and speaker talks, with regular downloadable resource packs.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.