Solving K8s Stateful App Storage Pain: Production-Ready Longhorn + MySQL StatefulSet
This article dissects the challenges of running MySQL as a stateful workload on Kubernetes, explains why storage, consistency, and fail‑over are the real pain points, and provides a production‑grade solution that combines Longhorn distributed block storage with a carefully engineered MySQL 8.0 StatefulSet, complete with YAML manifests, performance tuning, backup strategies, and disaster‑recovery playbooks.
1. Why Stateful Applications on Kubernetes Are Hard
Kubernetes excels at orchestrating stateless services, but databases, message queues, and search engines care about data placement, durability, node‑failure recovery, primary‑replica topology, scheduling constraints, backup/restore, and latency under high concurrency.
Common pitfalls when moving MySQL to K8s include focusing only on the StatefulSet and ignoring the underlying PersistentVolume failure model, testing only that data survives pod deletion, neglecting node‑failure + volume‑rebuild + replica recovery, and treating storage HA as DB HA.
2. Scope of the Solution
The article covers six dimensions: architecture principles, engineering implementation, production‑grade YAML, performance testing, failure‑scenario drills, and evolution roadmap.
2.1 Where the Solution Fits
Small‑to‑medium production environments that are fully containerized and want a unified delivery and ops entry point.
Workloads with RPO/RTO requirements that do not yet need a full‑blown database operator.
Teams familiar with K8s/Helm rather than traditional VM ops.
Organizations that want MySQL, monitoring, and backup under GitOps.
2.2 Where It Does Not Fit
Ultra‑low‑latency write workloads.
Very large OLTP primary nodes with sustained high IOPS.
Scenarios requiring automatic failover, automated topology rebuild, automated backup orchestration, or PITR.
Cross‑region disaster‑recovery as a hard requirement.
2.3 Core Capabilities of Longhorn
Volume replica redundancy.
Volume attach/detach and rebuild.
Snapshots and backup.
Node‑failure volume recovery.
Longhorn does not provide automatic primary election, topology repair, connection‑string switching, or SQL‑level consistency fixes.
3. Real‑World Business Background
In a typical e‑commerce order system, components include Nginx Ingress, Spring Boot order service (20‑50 pods), Redis, Kafka, and MySQL for order data, transaction deduction, and payment status. The database workload is write‑heavy, with many transactions, random I/O, and latency directly affecting order response time.
4. Overall Architecture – Not Just “Put MySQL In”
4.1 Target Architecture
Kubernetes multi‑node cluster.
Longhorn as distributed block storage.
MySQL 8.0 single primary + two replicas.
GTID replication.
StatefulSet for stable network identity and per‑pod PVC.
Separate Services for write (primary) and read (replicas).
Backup jobs that push snapshots or logical backups to object storage.
PDB, anti‑affinity, topology spread, probes, and resource quotas for stability.
4.2 Design Principles
Stable pod identity : each MySQL pod keeps a stable hostname and its own volume.
Storage‑scheduler coordination : volume placement, pod scheduling, and Longhorn replica placement must be aligned.
Layered disaster recovery : Longhorn guarantees volume resilience; MySQL handles primary‑replica consistency.
Node‑failure first : design for recovery after a node crash, not just pod loss.
High‑concurrency first : provision CPU, memory, and I/O headroom.
Future‑proof : today use StatefulSet; tomorrow you can migrate to an operator.
5. What Longhorn Actually Solves
Longhorn provides distributed block storage with replica redundancy, automatic volume attach/detach, snapshots, backups, and node‑failure recovery. It does not handle MySQL‑level HA such as automatic primary election or GTID topology repair.
5.1 Longhorn Architecture
API / control layer : Longhorn manager runs on each node, watches CRDs, schedules volumes, manages replicas, and orchestrates snapshots/backups.
CSI integration layer : Kubernetes talks to Longhorn via CSI for volume create, attach, expand, and detach.
Data plane : each volume has an Engine that processes I/O and forwards writes to all healthy replicas.
Replica layer : each replica stores block data and snapshots on the host filesystem.
5.2 “One Volume – One Engine” Significance
Each volume runs its own Engine and Replica processes, so a failure in one volume does not affect others. This enables independent rebuild, snapshot, and migration, and provides clear performance and fault isolation.
5.3 Write Path Through Longhorn
MySQL writes page, redo log, and binlog.
The container’s filesystem forwards the write to the attached block device.
The block device is provided by Longhorn CSI.
The Engine receives the write and synchronously forwards it to all healthy replicas.
After all replicas acknowledge, the Engine returns success to MySQL.
This path shows that transaction latency depends on network jitter, node load, disk performance, and replica health. Parameters such as innodb_flush_log_at_trx_commit and sync_binlog interact with the number of Longhorn replicas.
5.4 Node‑Failure Recovery
When a node crashes, Longhorn selects healthy replicas, reassembles the volume, and re‑attaches it to a new pod. MySQL starts with the existing data directory, but automatic primary‑switching still requires application‑level handling.
6. What StatefulSet Can and Cannot Do
Provides stable pod names (e.g., mysql-0), stable network IDs via a Headless Service, and per‑pod PVC.
Does NOT automatically select a primary, switch primary, repair replication topology, manage backups, perform PITR, or handle connection‑string changes.
Thus, StatefulSet is the runtime shell; the database governance layer must be built on top.
7. Common Demo Pitfalls
All pods share the same config, no primary/replica distinction.
Root password stored in plain‑text ConfigMap.
No PDB, anti‑affinity, or resource limits.
Probes too aggressive, causing premature pod kills.
Missing init scripts for data directory creation.
No GTID, making failover complex.
Only YAML is provided without operational or recovery steps.
8. Production‑Ready Selection – Single Primary + Two Replicas + GTID + Longhorn 3‑Replica
8.1 Topology Choice
mysql-0as primary. mysql-1 and mysql-2 as replicas.
Write traffic goes to Service mysql-primary.
Read traffic goes to Service mysql-replicas.
Reasons: clear architecture, easy to understand failure model, cost‑controlled, and smooth migration to an operator later.
8.2 Replication Method
MySQL 8.0 GTID replication is used instead of file‑position replication because GTID simplifies primary‑replica rebuild, automatic alignment during failover, and reduces binlog management complexity.
8.3 Replica Count
Longhorn defaults to three replicas, which balances fault tolerance and resource usage. One replica offers no HA; two replicas improve resilience but still risk data loss if both fail; three replicas cover most single‑node failures. More replicas increase write amplification, network traffic, and disk usage.
9. Pre‑Deployment Requirements
9.1 Node Requirements
At least three worker nodes, each with local SSD or cloud disk.
Stable low‑latency network.
Proper time synchronization.
Kernel parameters and disk scheduler tuned for database workloads.
9.2 Mandatory Dependencies
Longhorn relies on iSCSI. Install open-iscsi (Ubuntu/Debian) or iscsi-initiator-utils (CentOS/Rocky/RHEL) and enable the iscsid service on every node.
# Ubuntu / Debian
sudo apt-get update
sudo apt-get install -y open-iscsi nfs-common
sudo systemctl enable iscsid --now
# CentOS / Rocky / RHEL
sudo yum install -y iscsi-initiator-utils nfs-utils
sudo systemctl enable iscsid --now9.3 Node Labels
kubectl label node worker-1 storage-node=true mysql-node=true disk=ssd
kubectl label node worker-2 storage-node=true mysql-node=true disk=ssd
kubectl label node worker-3 storage-node=true mysql-node=true disk=ssd9.4 Resource Baseline per MySQL Instance
CPU request: 2, limit: 4
Memory request: 4Gi, limit: 8Gi
Volume size: ≥100Gi (larger for high‑concurrency order systems, e.g., 8C16G, NVMe, dedicated storage node)
10. Longhorn Production Installation
helm repo add longhorn https://charts.longhorn.io
helm repo update
helm upgrade --install longhorn longhorn/longhorn \
--namespace longhorn-system \
--create-namespace \
--set defaultSettings.defaultReplicaCount=3 \
--set persistence.defaultClass=true \
--set persistence.defaultClassReplicaCount=3 \
--set defaultSettings.replicaAutoBalance=best-effort \
--set defaultSettings.storageOverProvisioningPercentage=150 \
--set defaultSettings.storageMinimalAvailablePercentage=15Key Parameters
defaultReplicaCount: number of volume replicas (3 for production). replicaAutoBalance: best-effort to keep replicas evenly spread. storageMinimalAvailablePercentage: keep at least 10‑20% free on disks. dataLocality: best-effort to place active replicas near the pod.
11. StorageClass for MySQL
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
name: longhorn-mysql-prod
provisioner: driver.longhorn.io
allowVolumeExpansion: true
reclaimPolicy: Retain
volumeBindingMode: WaitForFirstConsumer
parameters:
numberOfReplicas: "3"
staleReplicaTimeout: "30"
fsType: "ext4"
dataLocality: "best-effort"
replicaAutoBalance: "best-effort"
diskSelector: "ssd"
nodeSelector: "storage-node"
unmapMarkSnapChainRemoved: "ignored"Why these settings? Retain prevents accidental data loss if a PVC is deleted. WaitForFirstConsumer delays volume binding until the pod is scheduled, avoiding bad node placement. dataLocality: best‑effort reduces cross‑node I/O latency. allowVolumeExpansion enables online growth, but must be coordinated with a maintenance window.
12. Service Design – Stable DNS and Read/Write Separation
# Headless Service for stable DNS
apiVersion: v1
kind: Service
metadata:
name: mysql
namespace: order-system
spec:
clusterIP: None
selector:
app: mysql
ports:
- name: mysql
port: 3306
targetPort: 3306
---
# Write entry (primary)
apiVersion: v1
kind: Service
metadata:
name: mysql-primary
namespace: order-system
spec:
selector:
app: mysql
statefulset.kubernetes.io/pod-name: mysql-0
ports:
- name: mysql
port: 3306
targetPort: 3306
---
# Read‑only entry (replicas)
apiVersion: v1
kind: Service
metadata:
name: mysql-replicas
namespace: order-system
spec:
selector:
app: mysql
role: replica
ports:
- name: mysql
port: 3306
targetPort: 3306Note: the primary Service is hard‑wired to mysql-0. Automatic failover would require an external controller or operator.
13. PodDisruptionBudget
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: mysql-pdb
namespace: order-system
spec:
minAvailable: 2
selector:
matchLabels:
app: mysqlThis guarantees that at least two MySQL pods stay up during node maintenance or cluster upgrades.
14. Full Production‑Grade StatefulSet
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: mysql
namespace: order-system
spec:
serviceName: mysql
replicas: 3
podManagementPolicy: OrderedReady
updateStrategy:
type: RollingUpdate
selector:
matchLabels:
app: mysql
template:
metadata:
labels:
app: mysql
spec:
serviceAccountName: default
terminationGracePeriodSeconds: 120
nodeSelector:
mysql-node: "true"
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
app: mysql
topologyKey: kubernetes.io/hostname
topologySpreadConstraints:
- maxSkew: 1
topologyKey: kubernetes.io/hostname
whenUnsatisfiable: DoNotSchedule
labelSelector:
matchLabels:
app: mysql
initContainers:
- name: init-config
image: mysql:8.0.36
command: ["/bin/bash", "/mnt/config/init-mysql.sh"]
env:
- name: HOSTNAME
valueFrom:
fieldRef:
fieldPath: metadata.name
volumeMounts:
- name: conf
mountPath: /mnt/conf.d
- name: config
mountPath: /mnt/config
- name: init-data-dir
image: busybox:1.36
command: ["/bin/sh", "-c", "set -e; mkdir -p /var/lib/mysql; chown -R 999:999 /var/lib/mysql"]
volumeMounts:
- name: data
mountPath: /var/lib/mysql
containers:
- name: mysql
image: mysql:8.0.36
imagePullPolicy: IfNotPresent
ports:
- containerPort: 3306
name: mysql
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_ROOT_PASSWORD
- name: MYSQL_REPL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_REPL_PASSWORD
startupProbe:
exec:
command: ["/bin/bash", "-c", "mysqladmin ping -h 127.0.0.1 -uroot -p\"${MYSQL_ROOT_PASSWORD}\""]
failureThreshold: 60
periodSeconds: 5
timeoutSeconds: 3
readinessProbe:
exec:
command: ["/etc/mysql/config/readiness-check.sh"]
failureThreshold: 3
periodSeconds: 10
timeoutSeconds: 5
livenessProbe:
exec:
command: ["/bin/bash", "-c", "mysqladmin ping -h 127.0.0.1 -uroot -p\"${MYSQL_ROOT_PASSWORD}\""]
initialDelaySeconds: 60
periodSeconds: 15
timeoutSeconds: 5
failureThreshold: 5
lifecycle:
preStop:
exec:
command: ["/bin/bash", "-c", "mysqladmin -uroot -p\"${MYSQL_ROOT_PASSWORD}\" shutdown || true; sleep 10"]
resources:
requests:
cpu: "2"
memory: 4Gi
limits:
cpu: "4"
memory: 8Gi
volumeMounts:
- name: data
mountPath: /var/lib/mysql
- name: conf
mountPath: /etc/mysql/conf.d
- name: config
mountPath: /etc/mysql/config
volumes:
- name: conf
emptyDir: {}
- name: config
configMap:
name: mysql-config
defaultMode: 0755
volumeClaimTemplates:
- metadata:
name: data
spec:
accessModes: ["ReadWriteOnce"]
storageClassName: longhorn-mysql-prod
resources:
requests:
storage: 100GiWhy each field matters terminationGracePeriodSeconds: 120 gives MySQL enough time to flush logs and shut down cleanly. podAntiAffinity prevents multiple MySQL pods from landing on the same node, reducing correlated failures. topologySpreadConstraints spreads pods across hosts for better resilience. startupProbe tolerates the long cold‑start or crash‑recovery of MySQL; without it, Kubelet could kill the pod prematurely. preStop runs mysqladmin shutdown to avoid unclean shutdowns. OrderedReady ensures the primary ( mysql-0) becomes ready before replicas start.
15. Role Tagging for Read‑Only Service
StatefulSet does not automatically label pods as primary or replica. Two approaches:
External controller or GitOps pipeline adds role=primary / role=replica labels after pod creation.
Split into two StatefulSets: one with 1 replica (primary) and another with 2 replicas (read‑only). The latter simplifies Service selectors.
16. Replication Initialization Job
MySQL containers start without replication configured. A one‑off Job performs the following steps:
Wait for the primary ( mysql-0) to become reachable.
Create replication user repl with the password from the secret.
For each replica, wait until the pod is reachable, then run
STOP REPLICA; RESET REPLICA ALL; CHANGE REPLICATION SOURCE TO …; START REPLICA;using GTID auto‑position.
apiVersion: batch/v1
kind: Job
metadata:
name: mysql-init-replication
namespace: order-system
spec:
ttlSecondsAfterFinished: 600
template:
spec:
restartPolicy: OnFailure
containers:
- name: init-replication
image: mysql:8.0.36
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_ROOT_PASSWORD
- name: MYSQL_REPL_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_REPL_PASSWORD
command: ["/bin/bash", "-c", "
set -euo pipefail
echo \"waiting primary...\"
until mysql -h mysql-0.mysql.order-system.svc.cluster.local -uroot -p\"${MYSQL_ROOT_PASSWORD}\" -e \"SELECT 1\"; do sleep 5; done
mysql -h mysql-0.mysql.order-system.svc.cluster.local -uroot -p\"${MYSQL_ROOT_PASSWORD}\" -e \"
CREATE USER IF NOT EXISTS 'repl'@'%' IDENTIFIED BY '${MYSQL_REPL_PASSWORD}';
GRANT REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO 'repl'@'%';
FLUSH PRIVILEGES;
\"
for i in 1 2; do
host=\"mysql-${i}.mysql.order-system.svc.cluster.local\"
echo \"waiting replica ${host}...\"
until mysql -h \"${host}\" -uroot -p\"${MYSQL_ROOT_PASSWORD}\" -e \"SELECT 1\"; do sleep 5; done
mysql -h \"${host}\" -uroot -p\"${MYSQL_ROOT_PASSWORD}\" -e \"
STOP REPLICA;
RESET REPLICA ALL;
CHANGE REPLICATION SOURCE TO
SOURCE_HOST='mysql-0.mysql.order-system.svc.cluster.local',
SOURCE_PORT=3306,
SOURCE_USER='repl',
SOURCE_PASSWORD='${MYSQL_REPL_PASSWORD}',
SOURCE_AUTO_POSITION=1,
GET_SOURCE_PUBLIC_KEY=1;
START REPLICA;
\"
done
"]This job is suitable for first‑time deployment, test‑environment rebuilds, or manual recovery. In large‑scale production, a dedicated operator or automation script should manage replication lifecycle.
17. Backup Strategy – Longhorn Snapshots + Logical Dumps
Longhorn snapshots provide fast volume‑level rollback but do not replace logical backups needed for point‑in‑time recovery, table‑level restores, or cross‑version migrations.
17.1 Logical Backup CronJob
apiVersion: batch/v1
kind: CronJob
metadata:
name: mysql-logical-backup
namespace: order-system
spec:
schedule: "0 2 * * *"
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 3
jobTemplate:
spec:
template:
spec:
restartPolicy: OnFailure
containers:
- name: backup
image: mysql:8.0.36
env:
- name: MYSQL_ROOT_PASSWORD
valueFrom:
secretKeyRef:
name: mysql-secret
key: MYSQL_ROOT_PASSWORD
command: ["/bin/bash", "-c", "
set -euo pipefail
ts=$(date +%F-%H%M%S)
mkdir -p /backup
mysqldump -h mysql-primary.order-system.svc.cluster.local -uroot -p\"${MYSQL_ROOT_PASSWORD}\" \
--single-transaction \
--set-gtid-purged=OFF \
--routines --events --triggers \
--all-databases > /backup/mysql-${ts}.sql
"]
volumeMounts:
- name: backup
mountPath: /backup
volumes:
- name: backup
persistentVolumeClaim:
claimName: mysql-backup-pvc17.2 Recovery Scenarios
Single‑table or accidental delete: restore to a temporary instance and copy data back.
Single‑instance failure: let Longhorn reattach the volume or rebuild the replica.
Primary disaster: create a new volume, restore from logical backup, apply binlog/PITR to catch up, then switch application traffic.
18. High‑Concurrency Production Tuning
18.1 Where Bottlenecks Appear
Application layer: connection‑pool saturation, slow SQL, hotspot row contention.
Database layer: insufficient buffer pool, redo/fsync pressure, lock waits.
Storage layer: Longhorn replica sync latency, replica rebuild bandwidth.
Scheduler layer: pod drift causing data locality loss.
Network layer: replication traffic sharing bandwidth with business traffic.
18.2 MySQL Parameter Choices
Strong consistency (e.g., payment, inventory)
innodb_flush_log_at_trx_commit=1
sync_binlog=1Pros: guarantees durability per transaction. Cons: higher fsync overhead, storage latency directly adds to request latency.
Performance‑first (e.g., analytics, tolerant of minor loss)
innodb_flush_log_at_trx_commit=2
sync_binlog=100Pros: reduces fsync pressure, improves throughput. Cons: possible loss of the last few seconds of data on crash.
18.3 Connection‑Pool Coordination
Limit per‑pod connection count; avoid 100 connections per pod when 50 pods would create 5,000 connections.
Use HikariCP maximumPoolSize tuned to the DB capacity.
Route read traffic to mysql-replicas service.
Employ async queues (Kafka, Redis) to smooth write spikes.
18.4 Storage & Network Isolation
Separate traffic for Longhorn replica sync, MySQL replication, application queries, and backups. Use dedicated storage nodes, high‑bandwidth NICs, and node‑level QoS or rate‑limiting to prevent replica rebuild storms from choking business traffic.
18.5 Replica Rebuild Storm Mitigation
When a node experiences network jitter, Longhorn may mark a replica degraded and start rebuilding, consuming bandwidth and increasing MySQL latency. Guard against this by limiting rebuild speed, setting alerts on Degraded state, and avoiding changes during peak load.
19. Evolution Path – From Single Primary to Operator
Horizontal read scaling : increase replica count in the StatefulSet or create a separate replica StatefulSet.
Storage expansion : enlarge PVC during low‑traffic windows, then resize the filesystem inside the pod.
Operator migration : adopt Percona or Oracle MySQL Operator for automated primary election, backup orchestration, and topology repair.
Read‑write proxy : introduce ProxySQL or MySQL Router for smarter routing and failover.
Sharding or cloud‑managed DB when single‑instance limits are reached.
20. Pre‑Launch Acceptance Checklist
20.1 Storage Checks
All nodes have open-iscsi running.
Longhorn nodes and disks are schedulable.
StorageClass parameters verified.
Replica count matches design.
Test create/attach/delete of a volume.
20.2 Database Checks
Primary‑replica replication works ( SHOW REPLICA STATUS\G clean).
Slow‑query log enabled.
CPU/memory/buffer‑pool tuned per benchmark.
Application connection strings separated for read/write.
20.3 HA Checks
PDB active.
Anti‑affinity enforced.
Pods spread across nodes.
Node‑failure and pod‑deletion drills passed.
20.4 Backup & Restore Checks
Logical backup job succeeds.
Longhorn snapshot succeeds.
At least one full‑restore drill performed.
Recovery time meets RTO.
20.5 Monitoring & Alerting
MySQL metrics: QPS, TPS, active connections, slow queries, buffer‑pool hit rate, lock waits, Seconds_Behind_Source.
Longhorn metrics: volume health, replica rebuild count, IOPS, latency, node disk free space, snapshot count.
Kubernetes metrics: pod restarts, node NotReady, PVC status changes, scheduling failures, eviction events.
21. Production‑Level Drills
21.1 Write Verification
kubectl exec -it -n order-system mysql-0 -- mysql -uroot -pCreate a test database and table, insert a row, then verify on a replica.
21.2 Pod Deletion Drill
kubectl delete pod mysql-1 -n order-system
kubectl get pod -n order-system -wConfirm PVC stays, pod name stays, Longhorn re‑attaches, and replication resumes.
21.3 Node Failure Drift
NODE=$(kubectl get pod mysql-0 -n order-system -o jsonpath='{.spec.nodeName}')
kubectl cordon "$NODE"
kubectl delete pod mysql-0 -n order-systemObserve pod migration, volume re‑attach, and write latency impact.
21.4 Disk‑Full Scenario
Simulate by filling the Longhorn replica disk; expect volume to enter Degraded, snapshots to fail, and alerts to fire. Prevent by monitoring disk free space and snapshot count.
21.5 Replication Breakage
If Replica_IO_Running=No or Replica_SQL_Running=No, verify primary health, network, credentials, GTID set, and re‑run CHANGE REPLICATION SOURCE TO … if needed.
22. Common Pitfalls
Confusing storage HA with database HA – Longhorn protects the volume, not the primary election.
Missing iSCSI on nodes – pods stay in ContainerCreating.
Wrong PVC reclaimPolicy (Delete) – accidental data loss.
Over‑strict probes – MySQL gets killed during normal cold start.
Using the same Service for read, write, and replication – makes troubleshooting impossible.
Placing pod and its replicas on the same node – a single node failure kills both.
Leaving snapshots unchecked – long snapshot chains degrade performance.
Expanding storage or rebuilding replicas during traffic peaks – causes latency spikes.
Backing up without practicing restores – false sense of safety.
No evolution plan – staying with manual scripts when operator automation becomes necessary.
23. Decision Guidance for Tech Leaders
If the business can tolerate minimal data loss and occasional write latency spikes, you may relax innodb_flush_log_at_trx_commit and sync_binlog for higher throughput.
If the team is strong in K8s but not yet in database operators, Longhorn + StatefulSet gives a solid intermediate step.
When automatic failover, standardized backup, and large‑scale governance become required, migrate to a MySQL Operator.
Always perform capacity and performance benchmarks before production rollout.
24. Final Takeaway
StatefulSet gives MySQL a stable identity; Longhorn gives the volume resilience. The real production‑grade solution emerges from a systematic design that addresses consistency, performance, recovery, and future evolution.
25. Appendix – Practical Checklist
Use a dedicated StorageClass for MySQL (do not reuse default).
Set reclaimPolicy: Retain for all DB PVCs.
Enable anti‑affinity and topology spread for every DB pod.
Configure startupProbe and generous terminationGracePeriodSeconds.
Standardize on GTID replication.
Separate read/write traffic via distinct Services.
Govern Longhorn snapshot count.
Run logical backups alongside volume snapshots.
Monitor MySQL, Longhorn, and K8s layers together.
Quarterly node‑failure and recovery drills.
Upgrade to an operator when manual primary/replica handling becomes frequent.
In summary, the hardest part of running stateful workloads on Kubernetes is not getting the YAML to apply, but anticipating and engineering for failure models across the application, database, storage, and orchestration layers.
Ray's Galactic Tech
Practice together, never alone. We cover programming languages, development tools, learning methods, and pitfall notes. We simplify complex topics, guiding you from beginner to advanced. Weekly practical content—let's grow together!
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
