Operations 10 min read

How Cloud TiDB Automates Ops, Scaling, and High Availability on Kubernetes

This article explains how Cloud TiDB leverages Kubernetes and tidb‑operator to achieve automated operation management, dynamic scaling, resource isolation, and high availability for a cloud‑native distributed database.

UCloud Tech

Nov 15, 2017

How Cloud TiDB Automates Ops, Scaling, and High Availability on Kubernetes

In the article "Cloud TiDB Technical Secrets (Part 1)", the differences between TiDB and traditional single‑node relational databases and the overall architecture after integration with various technologies are analyzed. This continuation delves into TiDB’s key features and implementation details under a cloud computing architecture.

Automated Operations – Cloud database products require automated operation management; manual ops are unrealistic. The first step is to use Kubernetes to manage host resources into a large resource pool. Then, components such as tidb‑operator and tidb‑cloud‑manager automate one‑click deployment, scaling, rolling upgrades, and automatic failover of TiDB instances.

Cluster Creation – TiDB consists of three core components (tidb, tikv, pd), each a multi‑node distributed service with startup dependencies. The pd nodes are created similarly to etcd, requiring an initial single‑node cluster before joining. Because StatefulSet cannot handle this complexity, tidb‑operator implements custom controllers to orchestrate the process, leveraging Kubernetes scheduling to evenly distribute nodes and expose the cluster via load balancer.

Online Upgrade – tikv/pd pods use local storage, so data directories cannot be moved arbitrarily. The operator ensures pods stay on the same node during upgrades, and upgrades follow the Raft‑based consistency model, respecting service dependencies.

Failure Handling – When a node fails, the operator waits to confirm the failure, then schedules a new pod on another node, notifies TiDB to discard the failed node, and lets the pd module restore replica counts and migrate data, maintaining cluster balance.

Dynamic Scaling – TiDB’s elastic horizontal scaling allows users to adjust node count via the cloud console without downtime. Kubernetes’ reconcile mechanism detects desired‑actual differences, schedules new pods on nodes with available local PVs, and tidb‑operator adds them to the cluster. Scaling down follows a similar process, with safe pod termination and data migration handled by pd.

Resource Isolation – Containers provide cgroup‑based CPU, memory, and I/O limits and namespace isolation. Kubernetes schedules pods based on resource availability, and tidb‑operator respects TiDB’s constraints. Physical isolation is achieved by distributing pods across zones, racks, and hosts, while logical isolation uses namespaces to separate business data.

High Availability – TiDB’s distributed architecture and Raft algorithm ensure service continuity even when individual replicas fail. The operator adds topology labels (region, zone, rack, host) so pd can intelligently place replicas across distinct failure domains. Kubernetes scheduling and self‑healing controllers further guarantee availability.

Conclusion – As a cloud‑native database, TiDB leverages tidb‑operator and Kubernetes to provide automated management, dynamic scaling, multi‑tenant resource isolation, and robust high‑availability, dramatically reducing operational overhead.

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

cloud-native High Availability Kubernetes TiDB scaling

Written by

UCloud Tech

UCloud is a leading neutral cloud provider in China, developing its own IaaS, PaaS, AI service platform, and big data exchange platform, and delivering comprehensive industry solutions for public, private, hybrid, and dedicated clouds.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.