Mastering Velero: Backup and Restore OpenShift Clusters to Alibaba Cloud
This guide explains how to install Velero, configure Alibaba Cloud OSS credentials, create backup storage locations, perform manual and scheduled backups, restore clusters, use hooks, debug operations, expose Prometheus metrics, and handle disaster recovery for OpenShift environments.
Velero enables backup and restore of Kubernetes/OpenShift clusters by storing data in object storage, reducing the impact of disaster recovery (DR). It works with any supported object storage, such as Alibaba Cloud OSS or a local MinIO instance.
Installation
Install Velero on both the source OpenStack OpenShift cluster (for backup) and the target Alibaba Cloud OpenShift cluster (for restore). Use the official Alibaba Cloud OSS plugin and create a bucket with a RAM user that has the following permissions:
{
"Version": "1",
"Statement": [{
"Action": [
"ecs:DescribeSnapshots",
"ecs:CreateSnapshot",
"ecs:DeleteSnapshot",
"ecs:DescribeDisks",
"ecs:CreateDisk",
"ecs:Addtags",
"oss:PutObject",
"oss:GetObject",
"oss:DeleteObject",
"oss:GetBucket",
"oss:ListObjects"
],
"Resource": ["*"],
"Effect": "Allow"
}]
}Update install/credentials-velero with your Alibaba Cloud access keys and OSS endpoint:
ALIBABA_CLOUD_ACCESS_KEY_ID=<ALIBABA_CLOUD_ACCESS_KEY_ID>
ALIBABA_CLOUD_ACCESS_KEY_SECRET=<ALIBABA_CLOUD_ACCESS_KEY_SECRET>
ALIBABA_CLOUD_OSS_ENDPOINT=<ALIBABA_CLOUD_OSS_ENDPOINT>Set environment variables for the bucket and region (e.g., oss-cn-beijing.aliyuncs.com → region beijing).
BUCKET=<YOUR_BUCKET>
REGION=<YOUR_REGION>Create the Velero namespace, secret, and CRDs, then edit install/01-velero.yaml to replace <BUCKET> and <REGION> placeholders:
kubectl create namespace velero
kubectl create secret generic cloud-credentials --namespace velero --from-file cloud=install/credentials-velero
kubectl apply -f install/00-crds.yaml
sed -i "s#<BUCKET>#$BUCKET#" install/01-velero.yaml
sed -i "s#<REGION>#$REGION#" install/01-velero.yaml
kubectl apply -f install/01-velero.yamlThe image used in install/01-velero.yaml is synced from gcr.io/heptio-images/velero:latest. You can replace it with the latest official image.
Using Velero
Run velero --help to see all commands. Common operations include:
Manual backup of all resources: velero backup create ${BACKUP_NAME} Backup specific namespaces:
velero backup create ${BACKUP_NAME} --include-namespaces ns1,ns2Exclude namespaces:
velero backup create ${BACKUP_NAME} --exclude-namespaces ns1,ns2Backup selected resource types:
velero backup create ${BACKUP_NAME} --include-resources pod,secretSet backup TTL (default 30 days):
velero backup create ${BACKUP_NAME} --ttl 2160hDelete backup
velero backup delete ${BACKUP_NAME} --confirmScheduled backups
Create a cron‑like schedule (e.g., every 6 hours):
velero create schedule ${SCHEDULE_NAME} --schedule="0 */6 * * *"
# or using @every notation
velero create schedule ${SCHEDULE_NAME} --schedule="@every 6h"Restore
velero restore create ${RESTORE_NAME} --from-backup ${BACKUP_NAME}
# Restore from the latest schedule
velero restore create --from-schedule ${SCHEDULE_NAME}
# Restore selected resources
velero restore create --from-backup backup-2 --include-resources pod,secretHooks
Velero supports Pre and Post hooks (container, command, on‑error, timeout) injected via pod annotations or backup spec. Use with caution because hook failures abort the backup.
Debugging
velero backup describe <backupName> --detail
velero backup logs <backupName>
velero restore describe <restoreName> --detail
velero restore logs <restoreName>
kubectl logs deployment/velero -n veleroDownload backup
velero backup download ${BACKUP_NAME}Uninstall Velero
kubectl delete namespace/velero clusterrolebinding/velero
kubectl delete crds -l component=veleroExpose Prometheus metrics
Create a Service to expose metrics on port 8085:
apiVersion: v1
kind: Service
metadata:
labels:
velero: exporter
name: velero-exporter
namespace: velero
spec:
ports:
- name: https
port: 8085
protocol: TCP
targetPort: 8085
selector:
component: veleroDisaster Recovery
Create a velero schedule on cluster1 for periodic backups.
When DR occurs, patch the BackupStorageLocation on cluster2 to ReadOnly, run a restore from the latest backup, then set the location back to ReadWrite.
Tips
OpenStack cannot directly access Alibaba OSS; use an ECS reverse proxy and add hostAliases in install/01-velero.yaml for internal endpoint resolution.
Both clusters must run compatible Kubernetes versions.
Specify backup contents to avoid failures with unsupported resources (e.g., PV/PVC).
Velero syncs backups every 30 seconds; the controller only triggers when the last sync time exceeds 1 minute.
GC removes stuck or expired backups after 60 minutes.
Velero integrates Restic for PV snapshots; Restic does not support Alibaba OSS, so use native snapshots or another storage backend.
If OSS is unavailable, deploy MinIO as an S3‑compatible object store.
Signed-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
MaGe Linux Operations
Founded in 2009, MaGe Education is a top Chinese high‑end IT training brand. Its graduates earn 12K+ RMB salaries, and the school has trained tens of thousands of students. It offers high‑pay courses in Linux cloud operations, Python full‑stack, automation, data analysis, AI, and Go high‑concurrency architecture. Thanks to quality courses and a solid reputation, it has talent partnerships with numerous internet firms.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
