Cloud Native 11 min read

How to Deploy and Scale ByConity’s Cloud‑Native Data Warehouse on Kubernetes

ByConity is a cloud‑native, storage‑compute separated data warehouse engine that supports multi‑tenant isolation, high performance, and elastic scaling; this guide explains its three‑layer architecture, hardware requirements, Helm‑based Kubernetes deployment, dynamic scaling, and practical SQL testing steps.

ByteDance Cloud Native
ByteDance Cloud Native
ByteDance Cloud Native
How to Deploy and Scale ByConity’s Cloud‑Native Data Warehouse on Kubernetes

Introduction

ByConity is an open‑source cloud‑native data warehouse engine from ByteDance that uses a storage‑compute separation architecture to achieve read‑write isolation, elastic scaling, high performance, and strong data consistency. It supports multi‑tenant isolation and leverages OLAP optimizations such as columnar storage, vectorized execution, MPP, and query optimization.

ByConity Storage‑Compute Separation Architecture

The architecture consists of three layers:

Shared Service Layer : Entry point for all queries, containing Cloud Service and Metadata Storage, responsible for query parsing, optimization, and metadata management.

Compute Layer : The compute resource group, implemented as Virtual Warehouses (Read VW and Writer VW).

Cloud Storage Layer : Distributed unified storage (e.g., HDFS, S3) where all data resides; compute nodes read data from this layer.

Additional shared components include TSO, Daemon Manager, Resource Manager, background tasks, and service discovery.

Deploying ByConity on Kubernetes

Kubernetes provides scalability, high availability, load balancing, and fault tolerance, making it an ideal platform for ByConity.

Hardware Requirements

Minimum hardware for testing:

TSO: 1 CPU, 300 MiB memory, 5 GiB disk, 1 GbE, 1 instance

Server: 8 CPU, 32 GiB memory, 100 GiB disk, 1 GbE, 1 instance

Worker: 4 CPU, 16 GiB memory, 100 GiB+ disk, 1 GbE, 1 instance

DaemonManager: 1 CPU, 500 MiB memory, 5 GiB disk, 1 GbE, 1 instance

ResourceManager: 1 CPU, 2 GiB memory, 5 GiB disk, 1 GbE, 1 instance

Recommended production hardware:

TSO: 2 CPU, 2 GiB memory, 5 GiB disk, 10 GbE, 3 instances

Server: 16 CPU, 60 GiB memory, 1 TiB disk, 10 GbE, ≥1 instance

Worker: 16 CPU, 100 GiB memory, 2 TiB+ disk, 10 GbE, ≥1 instance

DaemonManager: 4 CPU, 10 GiB memory, 10 GiB disk, 10 GbE, 1 instance

ResourceManager: 8 CPU, 16 GiB memory, 10 GiB disk, 10 GbE, 1 instance

Tool Installation

Install

kubectl

for Kubernetes cluster management.

Install

helm

for package management.

Clone the deployment repository:

<code>git clone [email protected]:ByConity/byconity-deploy.git
cd byconity-deploy</code>

Configure Storage

Use local storage (e.g., OpenEBS local PV) together with ByConity Server and Worker for optimal TCO and performance. Server/Worker storage is only for disk cache and can be removed at any time.

Configure Helm Values

Copy

./chart/byconity/values.yml

from the cloned repo and modify the following fields:

storageClassName
timezone

Replica counts for server and worker

HDFS storage requests

Deploy the Cluster

Install with or without the FDB CRD:

<code># Install without FDB CRD
helm upgrade --install --create-namespace --namespace byconity -f ./your/custom/values.yaml byconity ./chart/byconity --set fdb.enabled=false

# Install with FDB cluster
helm upgrade --install --create-namespace --namespace byconity -f ./your/custom/values.yaml byconity ./chart/byconity</code>

Wait for pods to start:

<code>kubectl -n byconity get po</code>

Launch the ClickHouse client:

<code>kubectl -n byconity exec -it sts/byconity-server -- bash
root@byconity-server-0:/# clickhouse client</code>

Test the Cluster

<code>CREATE DATABASE IF NOT EXISTS test;
USE test;
DROP TABLE IF EXISTS test.lc;
CREATE TABLE test.lc (b LowCardinality(String)) engine=CnchMergeTree ORDER BY b;
INSERT INTO test.lc SELECT '0123456789' FROM numbers(100000000);
SELECT count(), b FROM test.lc GROUP BY b;
DROP TABLE IF EXISTS test.lc;
DROP DATABASE test;</code>

Manual Cluster Update (Add Virtual Warehouses)

Update

values.yaml

with new virtual warehouses and run a Helm upgrade, then create the warehouses via DDL:

<code>CREATE WAREHOUSE IF NOT EXISTS `my-new-vw-default` SETTINGS num_workers = 0, type = 'Read';
CREATE WAREHOUSE IF NOT EXISTS `my-new-vw-write` SETTINGS num_workers = 0, type = 'Write';</code>

Seamless Scaling on Kubernetes

Define load thresholds (e.g., 80% CPU) using Horizontal Pod Autoscaler (HPA). When thresholds are met, HPA automatically adds or removes ByConity nodes, adjusts load balancing, and maintains data consistency. Monitoring and alerting can be integrated with Prometheus.

Conclusion

Deploying ByConity on Kubernetes provides elastic scaling, high availability, load balancing, and fault tolerance while simplifying management. Seamless scaling improves system availability, flexibility, and reduces operational costs. Additional deployment options (single‑node Docker, physical machines, source compilation) are available for the community.

cloud nativeKubernetesData Warehouseauto scalingHelmByConity
ByteDance Cloud Native
Written by

ByteDance Cloud Native

Sharing ByteDance's cloud-native technologies, technical practices, and developer events.

0 followers
Reader feedback

How this landed with the community

login Sign in to like

Rate this article

Was this worth your time?

Sign in to rate
Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.