How Operators Turn Kubernetes into a Database Management Powerhouse
This article explains how Kubernetes' reconciliation loop, originally designed for stateless resources, can be extended to manage stateful workloads like PostgreSQL databases using Operators such as CloudNativePG and Atlas, providing a declarative, GitOps‑friendly workflow for provisioning, upgrading, and schema migration.
Running and managing stateful workloads such as databases on Kubernetes has historically been difficult because the declarative model works well for replaceable resources but not for persistent ones like PostgreSQL.
The core of resource management is maintaining the desired state, which becomes extremely complex at cloud‑native scale, involving VPCs, security groups, EC2 instances, Kubernetes objects, load balancers, secrets, databases, CI/CD pipelines, and more.
This complexity amplifies configuration drift, makes observability and troubleshooting harder, and increases the risk of cross‑environment inconsistencies.
Why Kubernetes struggled with databases
The air‑conditioner analogy illustrates a reconciliation loop: a sensor continuously reads the current temperature, a controller compares it to the target temperature, and the system adjusts cooling to reach the desired state. This loop underpins large‑scale infrastructure automation.
Auto‑scaling groups continuously monitor metrics such as CPU or memory and add or remove instances to meet target utilization.
Circuit breakers track error rates in service calls and adjust routing or reject requests when thresholds are exceeded, then restore traffic once conditions stabilize.
Kubernetes controllers watch declared resources (e.g., Deployments or StatefulSets) and automatically reconcile the actual state with the desired state.
The reconciliation loop scales well; clusters with thousands of nodes and tens of thousands of pods rely on it without manual SSH or ad‑hoc scripts.
Air‑conditioner model: how the reconciliation loop changed everything
The loop works excellently for stateless infrastructure but fails for stateful resources. Upgrading a PostgreSQL pod from version 16 to 17 highlights the problem: shutting down the old pod causes downtime, while launching a new pod that shares the same PVC risks data‑format incompatibilities.
A safe zero‑downtime upgrade typically follows a controlled sequence: deploy a new PostgreSQL instance alongside the old one, create a compatible schema, set up logical replication, monitor replication lag until it reaches zero, then switch traffic to the new instance.
Schema migrations face similar challenges. Because migrations are often non‑idempotent, applying them imperatively can break applications or corrupt data if not carefully orchestrated.
Kubernetes Operator: the game‑changer for database management
Operators extend Kubernetes with custom resources (CRDs) and controllers that encode operational knowledge, allowing installation, configuration, upgrade, and monitoring of complex stateful systems while continuously maintaining the desired state.
Core of every Operator
Operators rely on two Kubernetes primitives:
Custom Resource Definitions (CRDs) that add new API types such as PostgresCluster, KafkaTopic, or AtlasSchema, enabling you to describe resources Kubernetes does not natively understand.
Controllers that watch those custom resources, compare the actual state to the desired state defined in the CRD, and take actions to reconcile any differences.
CRDs and controllers together form a self‑healing loop for tasks that previously required manual scripts or fragile CI jobs.
CloudNativePG and Atlas: production‑ready database solutions
Using CloudNativePG (a PostgreSQL Operator) and Atlas Operator (a schema‑management Operator) you can manage PostgreSQL clusters and schema changes declaratively via a GitOps‑native workflow.
Step‑by‑step implementation
Step 0: Install CloudNativePG and create a cluster
helm repo add cnpg https://cloudnative-pg.github.io/charts
helm repo update
helm install cnpg cnpg/cloudnative-pgSave the following as cluster.yaml and apply it:
apiVersion: postgresql.cnpg.io/v1
kind: Cluster
metadata:
name: cluster-example
spec:
instances: 1
imageName: ghcr.io/cloudnative-pg/postgresql:15
storage:
storageClass: standard
size: 1Gi kubectl apply -f cluster.yamlThis creates a PostgreSQL database named app, a user app, a secret with credentials, and a read‑write service cluster-example-rw.default.
Step 1: Install Atlas Operator
helm install atlas-operator oci://ghcr.io/ariga/charts/atlas-operatorVerify the CRDs are registered: kubectl get crd | grep atlas Expected output:
atlasmigrations.db.atlasgo.io
atlasschemas.db.atlasgo.ioStep 2: Apply a schema
Define an AtlasSchema resource in atlas-schema.yaml:
apiVersion: db.atlasgo.io/v1alpha1
kind: AtlasSchema
metadata:
name: atlasschema-pg
spec:
credentials:
scheme: postgres
host: cluster-example-rw.default
user: app
passwordFrom:
secretKeyRef:
key: password
name: cluster-example-app
database: app
port: 5432
parameters:
sslmode: disable
schema:
sql: |
create table t1 (
id int
);Apply the schema: kubectl apply -f atlas-schema.yaml Check reconciliation status: kubectl get atlasschemas.db.atlasgo.io Expected output:
NAME READY REASON
atlasschema-pg True AppliedStep 3: Verify and evolve
Connect to the database: kubectl exec -ti cluster-example-1 -- psql app To evolve the schema, edit the sql block (e.g., add a name column) and re‑apply the manifest; Atlas detects the diff and safely applies the migration.
Key takeaways
The reconciliation loop is the pillar of cloud‑native infrastructure and can be extended to stateful resources via Operators.
Traditional methods for managing stateful systems (init containers, CI jobs, manual scripts) do not satisfy a declarative, autonomous model.
Operators bring Kubernetes' powerful control plane to databases, allowing you to manage clusters and schemas just like Deployments and Services.
CloudNativePG and Atlas enable an end‑to‑end, GitOps‑friendly database lifecycle—from provisioning to migration—entirely within Kubernetes.
Why this matters
As Kubernetes becomes the control plane for everything, stateful workloads must no longer be an exception. Operators and tools like Atlas make it possible to integrate databases into the same automated, repeatable workflows that power modern, scalable systems.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
