Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes
This tutorial walks you through preparing three identical VMs, installing required packages, configuring Rook and Ceph versions, deploying the storage cluster on a Kubernetes 1.20 environment, exposing the Ceph dashboard, and cleaning up the installation, complete with command examples and troubleshooting tips.
Environment Preparation
Three identical virtual machines are required. Each VM must have:
4 CPU cores, 8 GB RAM
CentOS 7 installed
Two disks: vda (40 GB) for the OS and vdb (20 GB) left unformatted for Ceph OSDs
Kubernetes version 1.20.0 (installed with kubeadm)
Docker 20.10.7
Verify the raw disk with: lsblk -f In the output, the disk whose FSTYPE column is empty (e.g., vdb) will be used as the raw device for Ceph OSDs.
Rook Overview
Rook is a Kubernetes Operator that automates the deployment, configuration, provisioning, monitoring, and upgrade of storage back‑ends such as Ceph, Cassandra, and NFS. It does not provide storage itself; it orchestrates existing storage systems to become self‑managing, self‑scaling, and self‑healing services.
Installation and Deployment
Prerequisites
Use Rook version 1.6.3 (or newer) with Ceph Octopus v15.2.11. Ensure the Kubernetes cluster is up and running.
Install LVM2 and load the RBD kernel module
# yum install -y lvm2
modprobe rbd
# Persist the module load across reboots
cat > /etc/sysconfig/modules/rbd.modules <<'EOF'
modprobe rbd
EOF
chmod 755 /etc/sysconfig/modules/rbd.modules
lsmod | grep rbdClone the Rook repository
git clone --single-branch --branch v1.6.3 https://github.com/rook/rook.gitAdjust the operator configuration
Navigate to the Ceph example directory and edit operator.yaml to replace the default GCR image references with a reachable mirror (e.g., an Alibaba Cloud registry). Update the following keys:
ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-resizer:v1.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-provisioner:v2.0.4"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-snapshotter:v4.0.0"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-attacher:v3.0.2"Enable the discovery daemon by setting ROOK_ENABLE_DISCOVERY_DAEMON=true in the same file.
Deploy the Rook operator and CRDs
cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yamlWait until all pods in the rook-ceph namespace report Running.
Create the CephCluster custom resource
kubectl create -f cluster.yamlSuccessful creation is indicated by the OSD pods rook-ceph-osd-0, rook-ceph-osd-1, and rook-ceph-osd-2 reaching the Running state.
Deploy the Ceph toolbox
kubectl create -f toolbox.yaml -n rook-cephEnter the toolbox pod to run Ceph CLI commands:
# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph statusExpose the Ceph Dashboard
The default dashboard service is ClusterIP. Apply the external NodePort service definition to make it reachable from outside the cluster:
kubectl apply -f dashboard-external-https.yamlRetrieve the allocated NodePort (e.g., 32529) and access the dashboard at https://<master‑ip>:32529. The default username is admin. Obtain the password with:
kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
-o jsonpath="{['data']['password']}" | base64 --decode && echoCleanup and Data Removal
Delete the CephCluster resource
kubectl -n rook-ceph delete cephcluster rook-cephRemove operator resources
kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yamlDelete host‑side data
Rook stores cluster data under /var/lib/rook. Remove this directory to avoid conflicts on a subsequent deployment:
rm -rf /var/lib/rookWipe the raw disk
Before recreating OSDs, the underlying disk must be clean. The following script demonstrates a typical erasure procedure for /dev/vdb:
#!/usr/bin/env bash
DISK="/dev/vdb"
sgdisk --zap-all $DISK
dd if=/dev/zero of=$DISK bs=1M count=100 oflag=direct,dsync
blkdiscard $DISK
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*FAQ
Stuck namespace deletion
If the rook-ceph namespace remains in Terminating, clear its finalizers:
NAMESPACE=rook-ceph
kubectl proxy &
kubectl get namespace $NAMESPACE -o json | \
jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT \
--data-binary @temp.json \
127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalizeCephCluster not deleting after namespace removal
Edit the CephCluster resource and remove the finalizers field:
kubectl edit cephcluster.ceph.rook.io -n rook-ceph rook-ceph
# Delete the finalizers entry and save.Dashboard shows HEALTH_WARN
Disable the insecure global ID reclaim warning from inside the toolbox pod:
ceph config set mon auth_allow_insecure_global_id_reclaim falseSigned-in readers can open the original source through BestHub's protected redirect.
This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.
Cloud Native Technology Community
The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.
How this landed with the community
Was this worth your time?
0 Comments
Thoughtful readers leave field notes, pushback, and hard-won operational detail here.
