Cloud Native 14 min read

Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes

This tutorial walks you through preparing three identical VMs, installing required packages, configuring Rook and Ceph versions, deploying the storage cluster on a Kubernetes 1.20 environment, exposing the Ceph dashboard, and cleaning up the installation, complete with command examples and troubleshooting tips.

Cloud Native Technology Community

Dec 8, 2021

Step-by-Step Guide to Build a Distributed Rook/Ceph Storage Cluster on Kubernetes

Environment Preparation

Three identical virtual machines are required. Each VM must have:

4 CPU cores, 8 GB RAM

CentOS 7 installed

Two disks: vda (40 GB) for the OS and vdb (20 GB) left unformatted for Ceph OSDs

Kubernetes version 1.20.0 (installed with kubeadm)

Docker 20.10.7

Verify the raw disk with: lsblk -f In the output, the disk whose FSTYPE column is empty (e.g., vdb) will be used as the raw device for Ceph OSDs.

Rook Overview

Rook is a Kubernetes Operator that automates the deployment, configuration, provisioning, monitoring, and upgrade of storage back‑ends such as Ceph, Cassandra, and NFS. It does not provide storage itself; it orchestrates existing storage systems to become self‑managing, self‑scaling, and self‑healing services.

Installation and Deployment

Prerequisites

Use Rook version 1.6.3 (or newer) with Ceph Octopus v15.2.11. Ensure the Kubernetes cluster is up and running.

Install LVM2 and load the RBD kernel module

# yum install -y lvm2
modprobe rbd
# Persist the module load across reboots
cat > /etc/sysconfig/modules/rbd.modules <<'EOF'
modprobe rbd
EOF
chmod 755 /etc/sysconfig/modules/rbd.modules
lsmod | grep rbd

Clone the Rook repository

git clone --single-branch --branch v1.6.3 https://github.com/rook/rook.git

Adjust the operator configuration

Navigate to the Ceph example directory and edit operator.yaml to replace the default GCR image references with a reachable mirror (e.g., an Alibaba Cloud registry). Update the following keys:

ROOK_CSI_REGISTRAR_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-node-driver-registrar:v2.0.1"
ROOK_CSI_RESIZER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-resizer:v1.0.1"
ROOK_CSI_PROVISIONER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-provisioner:v2.0.4"
ROOK_CSI_SNAPSHOTTER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-snapshotter:v4.0.0"
ROOK_CSI_ATTACHER_IMAGE: "registry.cn-beijing.aliyuncs.com/dotbalo/csi-attacher:v3.0.2"

Enable the discovery daemon by setting ROOK_ENABLE_DISCOVERY_DAEMON=true in the same file.

Deploy the Rook operator and CRDs

cd rook/cluster/examples/kubernetes/ceph
kubectl create -f crds.yaml -f common.yaml -f operator.yaml

Wait until all pods in the rook-ceph namespace report Running.

Create the CephCluster custom resource

kubectl create -f cluster.yaml

Successful creation is indicated by the OSD pods rook-ceph-osd-0, rook-ceph-osd-1, and rook-ceph-osd-2 reaching the Running state.

Deploy the Ceph toolbox

kubectl create -f toolbox.yaml -n rook-ceph

Enter the toolbox pod to run Ceph CLI commands:

# kubectl -n rook-ceph exec -it deploy/rook-ceph-tools -- bash
ceph status

Expose the Ceph Dashboard

The default dashboard service is ClusterIP. Apply the external NodePort service definition to make it reachable from outside the cluster:

kubectl apply -f dashboard-external-https.yaml

Retrieve the allocated NodePort (e.g., 32529) and access the dashboard at https://<master‑ip>:32529. The default username is admin. Obtain the password with:

kubectl -n rook-ceph get secret rook-ceph-dashboard-password \
  -o jsonpath="{['data']['password']}" | base64 --decode && echo

Cleanup and Data Removal

Delete the CephCluster resource

kubectl -n rook-ceph delete cephcluster rook-ceph

Remove operator resources

kubectl delete -f operator.yaml
kubectl delete -f common.yaml
kubectl delete -f crds.yaml

Delete host‑side data

Rook stores cluster data under /var/lib/rook. Remove this directory to avoid conflicts on a subsequent deployment:

rm -rf /var/lib/rook

Wipe the raw disk

Before recreating OSDs, the underlying disk must be clean. The following script demonstrates a typical erasure procedure for /dev/vdb:

#!/usr/bin/env bash
DISK="/dev/vdb"
sgdisk --zap-all $DISK
dd if=/dev/zero of=$DISK bs=1M count=100 oflag=direct,dsync
blkdiscard $DISK
ls /dev/mapper/ceph-* | xargs -I% -- dmsetup remove %
rm -rf /dev/ceph-*
rm -rf /dev/mapper/ceph--*

FAQ

Stuck namespace deletion

If the rook-ceph namespace remains in Terminating, clear its finalizers:

NAMESPACE=rook-ceph
kubectl proxy &
kubectl get namespace $NAMESPACE -o json | \
  jq '.spec = {"finalizers":[]}' >temp.json
curl -k -H "Content-Type: application/json" -X PUT \
  --data-binary @temp.json \
  127.0.0.1:8001/api/v1/namespaces/$NAMESPACE/finalize

CephCluster not deleting after namespace removal

Edit the CephCluster resource and remove the finalizers field:

kubectl edit cephcluster.ceph.rook.io -n rook-ceph rook-ceph
# Delete the finalizers entry and save.

Dashboard shows HEALTH_WARN

Disable the insecure global ID reclaim warning from inside the toolbox pod:

ceph config set mon auth_allow_insecure_global_id_reclaim false

Original Source

Signed-in readers can open the original source through BestHub's protected redirect.

Republication Notice

This article has been distilled and summarized from source material, then republished for learning and reference. If you believe it infringes your rights, please contactand we will review it promptly.

Operations Deployment Kubernetes Distributed storage cloud native storage Ceph Rook

Written by

Cloud Native Technology Community

The Cloud Native Technology Community, part of the CNBPA Cloud Native Technology Practice Alliance, focuses on evangelizing cutting‑edge cloud‑native technologies and practical implementations. It shares in‑depth content, case studies, and event/meetup information on containers, Kubernetes, DevOps, Service Mesh, and other cloud‑native tech, along with updates from the CNBPA alliance.

0 followers

Reader feedback

How this landed with the community

Rate this article

Was this worth your time?

Discussion

0 Comments

Thoughtful readers leave field notes, pushback, and hard-won operational detail here.