How to Deploy and Scale ByConity’s Cloud‑Native Data Warehouse on Kubernetes
ByConity is a cloud‑native, storage‑compute separated data warehouse engine that supports multi‑tenant isolation, high performance, and elastic scaling; this guide explains its three‑layer architecture, hardware requirements, Helm‑based Kubernetes deployment, dynamic scaling, and practical SQL testing steps.
Introduction
ByConity is an open‑source cloud‑native data warehouse engine from ByteDance that uses a storage‑compute separation architecture to achieve read‑write isolation, elastic scaling, high performance, and strong data consistency. It supports multi‑tenant isolation and leverages OLAP optimizations such as columnar storage, vectorized execution, MPP, and query optimization.
ByConity Storage‑Compute Separation Architecture
The architecture consists of three layers:
Shared Service Layer : Entry point for all queries, containing Cloud Service and Metadata Storage, responsible for query parsing, optimization, and metadata management.
Compute Layer : The compute resource group, implemented as Virtual Warehouses (Read VW and Writer VW).
Cloud Storage Layer : Distributed unified storage (e.g., HDFS, S3) where all data resides; compute nodes read data from this layer.
Additional shared components include TSO, Daemon Manager, Resource Manager, background tasks, and service discovery.
Deploying ByConity on Kubernetes
Kubernetes provides scalability, high availability, load balancing, and fault tolerance, making it an ideal platform for ByConity.
Hardware Requirements
Minimum hardware for testing:
TSO: 1 CPU, 300 MiB memory, 5 GiB disk, 1 GbE, 1 instance
Server: 8 CPU, 32 GiB memory, 100 GiB disk, 1 GbE, 1 instance
Worker: 4 CPU, 16 GiB memory, 100 GiB+ disk, 1 GbE, 1 instance
DaemonManager: 1 CPU, 500 MiB memory, 5 GiB disk, 1 GbE, 1 instance
ResourceManager: 1 CPU, 2 GiB memory, 5 GiB disk, 1 GbE, 1 instance
Recommended production hardware:
TSO: 2 CPU, 2 GiB memory, 5 GiB disk, 10 GbE, 3 instances
Server: 16 CPU, 60 GiB memory, 1 TiB disk, 10 GbE, ≥1 instance
Worker: 16 CPU, 100 GiB memory, 2 TiB+ disk, 10 GbE, ≥1 instance
DaemonManager: 4 CPU, 10 GiB memory, 10 GiB disk, 10 GbE, 1 instance
ResourceManager: 8 CPU, 16 GiB memory, 10 GiB disk, 10 GbE, 1 instance
Tool Installation
Install
kubectlfor Kubernetes cluster management.
Install
helmfor package management.
Clone the deployment repository:
<code>git clone [email protected]:ByConity/byconity-deploy.git
cd byconity-deploy</code>Configure Storage
Use local storage (e.g., OpenEBS local PV) together with ByConity Server and Worker for optimal TCO and performance. Server/Worker storage is only for disk cache and can be removed at any time.
Configure Helm Values
Copy
./chart/byconity/values.ymlfrom the cloned repo and modify the following fields:
storageClassName timezoneReplica counts for server and worker
HDFS storage requests
Deploy the Cluster
Install with or without the FDB CRD:
<code># Install without FDB CRD
helm upgrade --install --create-namespace --namespace byconity -f ./your/custom/values.yaml byconity ./chart/byconity --set fdb.enabled=false
# Install with FDB cluster
helm upgrade --install --create-namespace --namespace byconity -f ./your/custom/values.yaml byconity ./chart/byconity</code>Wait for pods to start:
<code>kubectl -n byconity get po</code>Launch the ClickHouse client:
<code>kubectl -n byconity exec -it sts/byconity-server -- bash
root@byconity-server-0:/# clickhouse client</code>Test the Cluster
<code>CREATE DATABASE IF NOT EXISTS test;
USE test;
DROP TABLE IF EXISTS test.lc;
CREATE TABLE test.lc (b LowCardinality(String)) engine=CnchMergeTree ORDER BY b;
INSERT INTO test.lc SELECT '0123456789' FROM numbers(100000000);
SELECT count(), b FROM test.lc GROUP BY b;
DROP TABLE IF EXISTS test.lc;
DROP DATABASE test;</code>Manual Cluster Update (Add Virtual Warehouses)
Update
values.yamlwith new virtual warehouses and run a Helm upgrade, then create the warehouses via DDL:
<code>CREATE WAREHOUSE IF NOT EXISTS `my-new-vw-default` SETTINGS num_workers = 0, type = 'Read';
CREATE WAREHOUSE IF NOT EXISTS `my-new-vw-write` SETTINGS num_workers = 0, type = 'Write';</code>Seamless Scaling on Kubernetes
Define load thresholds (e.g., 80% CPU) using Horizontal Pod Autoscaler (HPA). When thresholds are met, HPA automatically adds or removes ByConity nodes, adjusts load balancing, and maintains data consistency. Monitoring and alerting can be integrated with Prometheus.
Conclusion
Deploying ByConity on Kubernetes provides elastic scaling, high availability, load balancing, and fault tolerance while simplifying management. Seamless scaling improves system availability, flexibility, and reduces operational costs. Additional deployment options (single‑node Docker, physical machines, source compilation) are available for the community.
ByteDance Cloud Native
Sharing ByteDance's cloud-native technologies, technical practices, and developer events.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.