After Years Using Kubernetes, I Finally Grasped CRDs – Build One from Scratch
The article reveals why most Kubernetes engineers use Custom Resource Definitions without truly understanding them, explains how CRDs act as the language that extends the Kubernetes API, and provides a step‑by‑step walkthrough to create a production‑ready DatabaseCluster CRD, interact with it via kubectl and the Python client, and avoid common pitfalls.
CRDs as the foundation of many Kubernetes tools
Argo CD Application, KEDA ScaledObject, Crossplane resources, and cert‑manager Certificate are all implemented as Custom Resource Definitions (CRDs). Recognizing this reveals that extending Kubernetes is fundamentally about defining new API objects.
Kubernetes as a language
CRDs add new vocabulary to the Kubernetes API. After a CRD is registered, resources such as DatabaseCluster, MLModel or TenantConfig become first‑class objects stored in etcd, queryable with kubectl and protected by RBAC.
Creating a CRD from scratch – DatabaseCluster
The minimal definition requires three sections:
group : the API group, e.g. infra.example.com names : plural, singular, kind and optional shortNames versions : each version declares served (accepts requests) and storage (persisted in etcd)
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databaseclusters.infra.example.com
spec:
group: infra.example.com
scope: Namespaced
names:
plural: databaseclusters
singular: databasecluster
kind: DatabaseCluster
shortNames:
- dbc
versions:
- name: v1alpha1
served: true
storage: true
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
properties: {}
status:
type: object
properties: {} servedmakes the version reachable via the API; storage designates the version that is written to etcd, enabling safe multi‑version upgrades.
Scope, served vs storage, and status subresource
CRDs can be Namespaced (resource lives in a namespace) or Cluster scoped (global). The served flag indicates the API version is active, while storage marks the version that actually persists objects—only one version may have storage: true. Adding a status subresource restricts writes to the status field to controllers, preventing users from overwriting system state.
CRD = vocabulary, Operator = behavior.
Registering and using the CRD
# Register the CRD
kubectl apply -f database-cluster-crd.yaml
# Verify registration
kubectl get crd databaseclusters.infra.example.comAfter registration, create an instance like any native resource:
apiVersion: infra.example.com/v1alpha1
kind: DatabaseCluster
metadata:
name: production-postgres
namespace: databases
spec:
engine: postgres
replicas: 3
region: ap-south-1
storageGB: 100
version: "16.2"Custom columns defined in the CRD allow kubectl get dbc -n databases to display concise information.
Interacting with the CRD via the Python client
# pip install kubernetes
from kubernetes import client, config
config.load_kube_config()
custom_api = client.CustomObjectsApi()
GROUP = "infra.example.com"
VERSION = "v1alpha1"
PLURAL = "databaseclusters"
NAMESPACE = "databases"
# List resources
clusters = custom_api.list_namespaced_custom_object(group=GROUP, version=VERSION, namespace=NAMESPACE, plural=PLURAL)
for c in clusters["items"]:
print(f"{c['metadata']['name']}: phase={c.get('status',{}).get('phase','Unknown')}, replicas={c['spec']['replicas']}")
# Create a new resource
new = {
"apiVersion": f"{GROUP}/{VERSION}",
"kind": "DatabaseCluster",
"metadata": {"name": "staging-mysql", "namespace": NAMESPACE},
"spec": {"engine": "mysql", "replicas": 1, "region": "ap-south-1", "storageGB": 20, "version": "8.0"}
}
custom_api.create_namespaced_custom_object(group=GROUP, version=VERSION, namespace=NAMESPACE, plural=PLURAL, body=new)
# Patch status (normally done by the operator)
status_patch = {"status": {"phase": "Provisioning"}}
custom_api.patch_namespaced_custom_object_status(group=GROUP, version=VERSION, namespace=NAMESPACE, plural=PLURAL, name="staging-mysql", body=status_patch)This low‑level API gives full control before a higher‑level operator (e.g., kopf) abstracts the reconciliation loop.
Common pitfalls
Omitting schema lets the API accept malformed objects.
Missing status subresource allows users to overwrite system state.
Destructive schema changes without proper migration break existing resources.
A CRD without an operator performs no action.
Production‑grade DatabaseCluster CRD
Key enhancements:
Required fields: engine, replicas, region Validation rules: enums for engine, minimum/maximum for replicas, defaults for storageGB Status subresource with phase, endpoint, and a conditions array
Additional printer columns for Replicas, Region, Phase,
Age # database-cluster-crd.yaml
apiVersion: apiextensions.k8s.io/v1
kind: CustomResourceDefinition
metadata:
name: databaseclusters.infra.example.com
annotations:
controller-gen.kubebuilder.io/version: v0.14.0
spec:
group: infra.example.com
scope: Namespaced
names:
plural: databaseclusters
singular: databasecluster
kind: DatabaseCluster
shortNames:
- dbc
versions:
- name: v1alpha1
served: true
storage: true
subresources:
status: {}
additionalPrinterColumns:
- name: Replicas
type: integer
jsonPath: .spec.replicas
- name: Region
type: string
jsonPath: .spec.region
- name: Phase
type: string
jsonPath: .status.phase
- name: Age
type: date
jsonPath: .metadata.creationTimestamp
schema:
openAPIV3Schema:
type: object
properties:
spec:
type: object
required: ["engine", "replicas", "region"]
properties:
engine:
type: string
enum: ["postgres", "mysql", "mariadb"]
description: "Database engine to use"
replicas:
type: integer
minimum: 1
maximum: 9
description: "Number of database replicas"
region:
type: string
description: "AWS region or datacenter location"
storageGB:
type: integer
minimum: 10
default: 20
description: "Storage size in GB"
version:
type: string
description: "Database engine version"
status:
type: object
properties:
phase:
type: string
description: "Current lifecycle phase"
endpoint:
type: string
description: "Connection endpoint when Ready"
conditions:
type: array
items:
type: object
properties:
type:
type: string
status:
type: string
lastTransitionTime:
type: string
format: date-time
reason:
type: string
message:
type: stringWhy schema matters
Without a schema, Kubernetes accepts any payload, turning the CRD into a bypass for validation. With a schema, the API server validates objects before they are stored, acting as the first line of defense.
Status subresource importance
The status subresource enforces a separation of concerns: users declare the desired state in spec, while the operator writes the actual state to status. This prevents accidental overwrites of system state.
Only the operator can modify status ; users cannot.
Dashboard‑style output with additional printer columns
Adding additionalPrinterColumns turns kubectl get dbc into a lightweight dashboard, showing columns such as REPLICAS, REGION, PHASE and AGE without a separate UI.
kubectl now displays concise, operator‑enriched information.
Common errors that can break a system
No schema – invalid objects are accepted.
No status subresource – users can overwrite operational state.
Destructive schema changes – existing resources become invalid.
Missing operator logic – the CRD does nothing on its own.
These are design issues in the API, not bugs in Kubernetes itself.
Design exercise – DatabaseBackup CRD
Example user manifest:
apiVersion: backup.example.com/v1
kind: DatabaseBackup
metadata:
name: my-postgres-daily
spec:
database: "postgres-prod"
schedule: "0 2 * * *" # Daily at 2 AM
retention: 7 # Keep 7 backupsKey design considerations derived from the article:
Desired state : the user wants a backup to run on the specified schedule and retain the defined number of snapshots.
Actual state : the operator must track the number of existing backups, the timestamp of the last backup, its success/failure status, and any other relevant metadata.
Optional fields that may be added: storage location (S3, GCS, local), backup type (full or incremental), notification settings for success/failure, and any additional metadata required by the operator.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
