Wednesday, January 30, 2019

Replace Failed OSD in Rook Deployed Ceph


If you have been reading some of my recent articles on Rook you have seen how to install a Ceph cluster with Rook on Kubernetes. This article extends on that Kubernetes installation and discusses how to replace a failed OSD in the Ceph cluster.

First lets review our current running Ceph cluster observing the rook-ceph-system, rook-ceph and inside the toolbox the Ceph status:

# kubectl get pods --all-namespaces -o wide
NAMESPACE          NAME                                      READY   STATUS      RESTARTS   AGE    IP            NODE          NOMINATED NODE   READINESS GATES
kube-system        coredns-86c58d9df4-22fps                  1/1     Running     4          3d2h   10.244.3.55   kube-node3               
kube-system        coredns-86c58d9df4-jp2zb                  1/1     Running     6          3d2h   10.244.2.66   kube-node2               
kube-system        etcd-kube-master                          1/1     Running     3          3d5h   10.0.0.81     kube-master              
kube-system        kube-apiserver-kube-master                1/1     Running     3          3d5h   10.0.0.81     kube-master              
kube-system        kube-controller-manager-kube-master       1/1     Running     5          3d5h   10.0.0.81     kube-master              
kube-system        kube-flannel-ds-amd64-5m9x5               1/1     Running     6          3d5h   10.0.0.83     kube-node2               
kube-system        kube-flannel-ds-amd64-7xgf4               1/1     Running     3          3d5h   10.0.0.81     kube-master              
kube-system        kube-flannel-ds-amd64-dhdzm               1/1     Running     5          3d2h   10.0.0.84     kube-node3               
kube-system        kube-flannel-ds-amd64-m6fx5               1/1     Running     3          3d5h   10.0.0.82     kube-node1               
kube-system        kube-proxy-bnbzn                          1/1     Running     3          3d5h   10.0.0.82     kube-node1               
kube-system        kube-proxy-gjxlg                          1/1     Running     4          3d2h   10.0.0.84     kube-node3               
kube-system        kube-proxy-kkxdb                          1/1     Running     3          3d5h   10.0.0.81     kube-master              
kube-system        kube-proxy-knzsl                          1/1     Running     6          3d5h   10.0.0.83     kube-node2               
kube-system        kube-scheduler-kube-master                1/1     Running     4          3d5h   10.0.0.81     kube-master              
rook-ceph-system   rook-ceph-agent-748v8                     1/1     Running     0          103m   10.0.0.83     kube-node2               
rook-ceph-system   rook-ceph-agent-9vznf                     1/1     Running     0          103m   10.0.0.82     kube-node1               
rook-ceph-system   rook-ceph-agent-hfdv6                     1/1     Running     0          103m   10.0.0.81     kube-master              
rook-ceph-system   rook-ceph-agent-lfh7m                     1/1     Running     0          103m   10.0.0.84     kube-node3               
rook-ceph-system   rook-ceph-operator-76cf7f88f-qmvn5        1/1     Running     0          103m   10.244.1.65   kube-node1               
rook-ceph-system   rook-discover-25h5z                       1/1     Running     0          103m   10.244.1.66   kube-node1               
rook-ceph-system   rook-discover-dcm7k                       1/1     Running     0          103m   10.244.0.41   kube-master              
rook-ceph-system   rook-discover-t4qs7                       1/1     Running     0          103m   10.244.3.61   kube-node3               
rook-ceph-system   rook-discover-w2nv5                       1/1     Running     0          103m   10.244.2.72   kube-node2               
rook-ceph          rook-ceph-mgr-a-8649f78d9b-k6gwl          1/1     Running     0          100m   10.244.3.62   kube-node3               
rook-ceph          rook-ceph-mon-a-576d9d49cc-q9pm6          1/1     Running     0          101m   10.244.0.42   kube-master              
rook-ceph          rook-ceph-mon-b-85f7b6cb6b-pnrhs          1/1     Running     0          101m   10.244.1.67   kube-node1               
rook-ceph          rook-ceph-mon-c-668f7f658d-hjf2v          1/1     Running     0          101m   10.244.2.74   kube-node2               
rook-ceph          rook-ceph-osd-0-6f76d5cc4c-t75gg          1/1     Running     0          100m   10.244.2.76   kube-node2               
rook-ceph          rook-ceph-osd-1-5759cd47c4-szvfg          1/1     Running     0          100m   10.244.3.64   kube-node3               
rook-ceph          rook-ceph-osd-2-6d69b78fbf-7s4bm          1/1     Running     0          100m   10.244.0.44   kube-master              
rook-ceph          rook-ceph-osd-3-7b457fc56d-22gw6          1/1     Running     0          100m   10.244.1.69   kube-node1               
rook-ceph          rook-ceph-osd-prepare-kube-master-72kfz   0/2     Completed   0          100m   10.244.0.43   kube-master              
rook-ceph          rook-ceph-osd-prepare-kube-node1-jp68h    0/2     Completed   0          100m   10.244.1.68   kube-node1               
rook-ceph          rook-ceph-osd-prepare-kube-node2-j89pc    0/2     Completed   0          100m   10.244.2.75   kube-node2               
rook-ceph          rook-ceph-osd-prepare-kube-node3-drh4t    0/2     Completed   0          100m   10.244.3.63   kube-node3               
rook-ceph          rook-ceph-tools-76c7d559b6-qvh2r          1/1     Running     0          6s     10.0.0.82     kube-node1               

# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

# ceph status
  cluster:
    id:     edc7cac7-21a3-45ae-80a9-5d470afb7576
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum c,a,b
    mgr: a(active)
    osd: 4 osds: 4 up, 4 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   17 GiB used, 123 GiB / 140 GiB avail
    pgs:     
 
# ceph osd tree  
ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF 
-1       0.13715 root default                                 
-5       0.03429     host kube-master                         
 2   hdd 0.03429         osd.2            up  1.00000 1.00000 
-4       0.03429     host kube-node1                          
 3   hdd 0.03429         osd.3            up  1.00000 1.00000 
-2       0.03429     host kube-node2                          
 0   hdd 0.03429         osd.0            up  1.00000 1.00000 
-3       0.03429     host kube-node3                          
 1   hdd 0.03429         osd.1            up  1.00000 1.00000 

At this point the Ceph cluster is clean and in a healthy state.  However I am going to introduce some chaos and which will cause osd1 to go down.  In my case since this is a virtual lab I am going to just kill the OSD process and clear out osd1 data to mimic a failed drive.

Now when we look at the cluster state in the toolbox we can see OSD1 is down:

# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

# ceph status
  cluster:
    id:     edc7cac7-21a3-45ae-80a9-5d470afb7576
    health: HEALTH_WARN
            1 osds down
            1 host (1 osds) down
 
  services:
    mon: 3 daemons, quorum c,a,b
    mgr: a(active)
    osd: 4 osds: 3 up, 4 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   17 GiB used, 123 GiB / 140 GiB avail
    pgs:     
 
[root@kube-node1 /]# ceph osd tree
ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF 
-1       0.13715 root default                                 
-5       0.03429     host kube-master                         
 2   hdd 0.03429         osd.2            up  1.00000 1.00000 
-4       0.03429     host kube-node1                          
 3   hdd 0.03429         osd.3            up  1.00000 1.00000 
-2       0.03429     host kube-node2                          
 0   hdd 0.03429         osd.0            up  1.00000 1.00000 
-3       0.03429     host kube-node3                          
 1   hdd 0.03429         osd.1          down  1.00000 1.00000 

Given I removed the contents of the OSD lets go ahead and replace the failed drive. The first steps are to go into the toolbox and run the usual commands to remove a Ceph OSD from the cluster:

# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

# ceph osd out osd.1
marked out osd.1. 

# ceph osd crush remove osd.1
removed item id 1 name 'osd.1' from crush map

# ceph auth del osd.1
updated

# ceph osd rm osd.1
removed osd.1

Lets exit out of the toolbox and go back to the master node command line and delete the Ceph OSD 3 deployment:

# kubectl delete deployment -n rook-ceph rook-ceph-osd-1
deployment.extensions "rook-ceph-osd-1" deleted

Now would be the time to replace the physically failed disk. In my case the disk is still good I just simulated the failure by downing the OSD process and removing the data.

To get the new disk back into the cluster we only need to restart the rook-ceph-operator pod and we can do so in Kubernetes with the following scale deployment commands:

# kubectl scale deployment rook-ceph-operator --replicas=0 -n rook-ceph-system
deployment.extensions/rook-ceph-operator scaled

# kubectl get pods --all-namespaces -o wide|grep operator

# kubectl scale deployment rook-ceph-operator --replicas=1 -n rook-ceph-system
deployment.extensions/rook-ceph-operator scaled

# kubectl get pods --all-namespaces -o wide|grep operator
rook-ceph-system   rook-ceph-operator-76cf7f88f-g9pxr        0/1     ContainerCreating   0          2s              kube-node2               

When the rook-ceph-operator is restarted it will go through and re-run each rook-ceph-osd-prepare container which will scan the system it is on and look for any disks that should be incorporated into the cluster based on the original cluster.yaml settings when the Ceph cluster was deployed with Rook.  In this case it will see the new disk on kube-node-3 and incorporate that into OSD1.

We can confirm our assessment by seeing a new container for OSD1 was spawned and also by logging into the toolbox and running the familiar Ceph commands:

# kubectl get pods -n rook-ceph -o wide
NAME                                      READY   STATUS      RESTARTS   AGE     IP            NODE          NOMINATED NODE   READINESS GATES
rook-ceph-mgr-a-8649f78d9b-k6gwl          1/1     Running     0          110m    10.244.3.62   kube-node3    <none>           <none>
rook-ceph-mon-a-576d9d49cc-q9pm6          1/1     Running     0          110m    10.244.0.42   kube-master   <none>           <none>
rook-ceph-mon-b-85f7b6cb6b-pnrhs          1/1     Running     0          110m    10.244.1.67   kube-node1    <none>           <none>
rook-ceph-mon-c-668f7f658d-hjf2v          1/1     Running     0          110m    10.244.2.74   kube-node2    <none>           <none>
rook-ceph-osd-0-6f76d5cc4c-t75gg          1/1     Running     0          109m    10.244.2.76   kube-node2    <none>           <none>
rook-ceph-osd-1-69f5d5ffd-kndd7           1/1     Running     0          67s     10.244.3.68   kube-node3    <none>           <none>
rook-ceph-osd-2-6d69b78fbf-7s4bm          1/1     Running     0          109m    10.244.0.44   kube-master   <none>           <none>
rook-ceph-osd-3-7b457fc56d-22gw6          1/1     Running     0          109m    10.244.1.69   kube-node1    <none>           <none>
rook-ceph-osd-prepare-kube-master-n2t7g   0/2     Completed   0          79s     10.244.0.47   kube-master   <none>           <none>
rook-ceph-osd-prepare-kube-node1-ttznt    0/2     Completed   0          77s     10.244.1.72   kube-node1    <none>           <none>
rook-ceph-osd-prepare-kube-node2-9kxcl    0/2     Completed   0          75s     10.244.2.79   kube-node2    <none>           <none>
rook-ceph-osd-prepare-kube-node3-cpf4s    0/2     Completed   0          73s     10.244.3.66   kube-node3    <none>           <none>
rook-ceph-tools-76c7d559b6-qvh2r          1/1     Running     0          9m28s   10.0.0.82     kube-node1    <none>           <none>

# ceph status
  cluster:
    id:     edc7cac7-21a3-45ae-80a9-5d470afb7576
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum c,a,b
    mgr: a(active)
    osd: 4 osds: 4 up, 4 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   17 GiB used, 123 GiB / 140 GiB avail
    pgs:     

# ceph osd tree 
ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF 
-1       0.13715 root default                                 
-5       0.03429     host kube-master                         
 2   hdd 0.03429         osd.2            up  1.00000 1.00000 
-4       0.03429     host kube-node1                          
 3   hdd 0.03429         osd.3            up  1.00000 1.00000 
-2       0.03429     host kube-node2                          
 0   hdd 0.03429         osd.0            up  1.00000 1.00000 
-3       0.03429     host kube-node3                          
 1       0.03429         osd.1            up  1.00000 1.00000 

As you can see replacing a failed OSD with Rook is about as uneventful as replacing a failed OSD in a standard deployed Ceph cluster.   Hopefully this demonstration provided the proof of that.

Further Reading:

Rook: https://github.com/rook/rook


Rook & Ceph on Kubernetes


In a previous article I wrote about using Rook to deploy a Ceph storage cluster within Minikube (link below). The original post described what Rook can provide and demonstrated the ease of quickly setting up an all in one Ceph cluster. However I wanted explore Rook further in a multi-node configuration and how it integrates with applications in Kubernetes.

First I needed to set up a base Kubernetes environment which consisted of 1 master and 3 worker nodes. I used the following steps on all nodes to prepare them for Kubernetes: add hostname to host files, disable Selinux and swap, enable br_netfilter, install supporting utilities, enable Kubernetes repo, install docker, install Kubernetes binaries and enable/disable relevant services.

# echo "10.0.0.81   kube-master" >> /etc/hosts
# echo "10.0.0.82   kube-node1" >> /etc/hosts
# echo "10.0.0.83   kube-node2" >> /etc/hosts
# echo "10.0.0.84   kube-node3" >> /etc/hosts
# setenforce 0
# sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
# swapoff -a
# sed -i.bak -r 's/(.+ swap .+)/#\1/' /etc/fstab
# modprobe br_netfilter
# echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
# echo 'br_netfilter' > /etc/modules-load.d/netfilter.conf
# echo net.bridge.bridge-nf-call-iptables=1 >> /etc/sysctl.d/10-bridge-nf-call-iptables.conf
# dnf install -y yum-utils device-mapper-persistent-data lvm2
# dnf install docker
# cat > /etc/yum.repos.d/kubernetes.repo < [kubernetes]
  > name=Kubernetes
  > baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
  > enabled=1
  > gpgcheck=1
  > repo_gpgcheck=1
  > gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
  >         https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
  > EOF
# dnf install -y kubelet kubeadm kubectl
# systemctl enable docker ; systemctl start docker ; systemctl enable kubelet ; systemctl start kubelet ; systemctl stop firewalld ; systemctl disable firewalld

Once the prerequisites are met on each node lets initialize the cluster on the master node:

# kubeadm init --apiserver-advertise-address=10.0.0.81 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.0.0.81 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kube-master localhost] and IPs [10.0.0.81 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kube-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.81]
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 19.511836 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-master" as an annotation
[mark-control-plane] Marking the node kube-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kube-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: etmucm.238nrw6a48yu0njb
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 10.0.0.81:6443 --token etmucm.238nrw6a48yu0njb --discovery-token-ca-cert-hash sha256:963d6d9d31f2db9debfaa600ef802d05c448f7dc9e9cb92aec268cf2a8cfee7b

After the master is up and running you can join the remaining nodes using the following command which was presented in the output when you initialized the master:

# kubeadm join 10.0.0.81:6443 --token etmucm.238nrw6a48yu0njb --discovery-token-ca-cert-hash sha256:963d6d9d31f2db9debfaa600ef802d05c448f7dc9e9cb92aec268cf2a8cfee7b
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.81:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.81:6443"
[discovery] Requesting info from "https://10.0.0.81:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.81:6443"
[discovery] Successfully established connection with API Server "10.0.0.81:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-node1" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

I like to do some housekeeping once all my nodes are joined which includes enabling scheduling on master and labeling the worker nodes as such:

# kubectl taint node kube-master node-role.kubernetes.io/master:NoSchedule-
# kubectl label node kube-node1 node-role.kubernetes.io/worker=worker
# kubectl label node kube-node2 node-role.kubernetes.io/worker=worker
# kubectl label node kube-node3 node-role.kubernetes.io/worker=worker
 
Once you have joined the nodes you should have a cluster that looks like this:

# kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
kube-master   Ready    master   19h   v1.13.2
kube-node1    Ready    worker   19h   v1.13.2
kube-node2    Ready    worker   19h   v1.13.2
kube-node3    Ready    worker   17h   v1.13.2
 
Next lets deploy Flannel for networking:

# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

And finally lets deploy Rook and Ceph cluster using the familiar steps from my previous article:

# git clone https://github.com/rook/rook.git
# cd ./rook/cluster/examples/kubernetes/ceph
# sed -i.bak s+/var/lib/rook+/data/rook+g cluster.yaml
# kubectl create -f operator.yaml
# kubectl create -f cluster.yaml
# kubectl create -f toolbox.yaml
 
Once all the containers have spun up you should have something that looks like the following:
 
# kubectl get pod -n rook-ceph -o wide
NAME                                      READY   STATUS      RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
rook-ceph-mgr-a-8649f78d9b-txsfm          1/1     Running     1          19h   10.244.2.12   kube-node2               
rook-ceph-mon-a-598b7bd4cd-kpxnx          1/1     Running     0          19h   10.244.0.3    kube-master              
rook-ceph-mon-c-759b8984f5-ggzjb          1/1     Running     1          19h   10.244.2.15   kube-node2               
rook-ceph-mon-d-77d55dcddf-mwnf8          1/1     Running     0          16h   10.244.3.3    kube-node3               
rook-ceph-osd-0-77b448bbcc-mdhsw          1/1     Running     1          19h   10.244.2.14   kube-node2               
rook-ceph-osd-1-65db4b7c5d-hgfcj          1/1     Running     0          16h   10.244.1.8    kube-node1               
rook-ceph-osd-2-5b475cb56c-x5w6n          1/1     Running     0          19h   10.244.0.5    kube-master              
rook-ceph-osd-3-657789944d-swjxd          1/1     Running     0          16h   10.244.3.6    kube-node3               
rook-ceph-osd-prepare-kube-master-tlhxf   0/2     Completed   0          16h   10.244.0.6    kube-master              
rook-ceph-osd-prepare-kube-node1-lgtrf    0/2     Completed   0          16h   10.244.1.12   kube-node1               
rook-ceph-osd-prepare-kube-node2-5tbt6    0/2     Completed   0          16h   10.244.2.17   kube-node2               
rook-ceph-osd-prepare-kube-node3-rrp4z    0/2     Completed   0          16h   10.244.3.5    kube-node3               
rook-ceph-tools-76c7d559b6-7kprh          1/1     Running     0          16h   10.0.0.84     kube-node3               

And of course we can validate the Ceph cluster is up and healthy via the toolbox container as well:

# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

# ceph status
  cluster:
    id:     4be6e204-3d82-4cc4-9ea4-57f0e71f99c5
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum d,a,c
    mgr: a(active)
    osd: 4 osds: 4 up, 4 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   17 GiB used, 123 GiB / 140 GiB avail
    pgs:     
 
# ceph osd tree
ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF
-1       0.13715 root default                                 
-4       0.03429     host kube-master                         
 2   hdd 0.03429         osd.2            up  1.00000 1.00000
-3       0.03429     host kube-node1                          
 1   hdd 0.03429         osd.1            up  1.00000 1.00000
-2       0.03429     host kube-node2                          
 0   hdd 0.03429         osd.0            up  1.00000 1.00000
-9       0.03429     host kube-node3                          
 3   hdd 0.03429         osd.3            up  1.00000 1.00000

Everything we have done up to this point has been very similar to what I did in the previous article with Minikube except instead of a single node we have a multiple node configuration. Now lets take it a step further and get an application to use our Ceph storage cluster.

The first step in Kubernetes will be to created a storageclass.yaml that uses Ceph.  Populate the storageclass.yaml with the following:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
  blockPool: replicapool
  # The value of "clusterNamespace" MUST be the same as the one in which your rook cluster exist
  clusterNamespace: rook-ceph
  # Specify the filesystem type of the volume. If not specified, it will use `ext4`.
  fstype: xfs
# Optional, default reclaimPolicy is "Delete". Other options are: "Retain", "Recycle" as documented in https://kubernetes.io/docs/concepts/storage/storage-classes/

Next lets create the storage class using the yaml we created and set it to default:

# kubectl create -f storageclass.yaml
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created

# kubectl get storageclass
NAME              PROVISIONER          AGE
rook-ceph-block   ceph.rook.io/block   61s

# kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/rook-ceph-block patched

# kubectl get storageclass
NAME                        PROVISIONER          AGE
rook-ceph-block (default)   ceph.rook.io/block   3m30s

Now that we have a storageclass that uses Ceph as the backend we now need an application to consume the storageclass. Thankfully the Rook git repo includes a couple of examples: Wordpress and MySQL. Lets go ahead and create those apps doing the following:

# cd ./rook/cluster/examples/kubernetes

# kubectl create -f mysql.yaml
service/wordpress-mysql created
persistentvolumeclaim/mysql-pv-claim created
deployment.apps/wordpress-mysql created

# kubectl create -f wordpress.yaml
service/wordpress created
persistentvolumeclaim/wp-pv-claim created
deployment.extensions/wordpress created



We can confirm our two applications are running by the following:

# kubectl get pods -n default -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
wordpress-7b6c4c79bb-7b4dq         1/1     Running   0          68s     10.244.1.14   kube-node1              
wordpress-mysql-6887bf844f-2m4h4   1/1     Running   0          2m47s   10.244.1.13   kube-node1              

Now lets confirm if they are actually using our Ceph storageclass:

# kubectl get pvc

NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mysql-pv-claim   Bound    pvc-0c0be0ec-2317-11e9-a462-5254003ede95   20Gi       RWO            rook-ceph-block   3m36s
wp-pv-claim      Bound    pvc-46b4b266-2317-11e9-a462-5254003ede95   20Gi       RWO            rook-ceph-block   118s

And lets also confirm Wordpress is up and running from a user perspective. Note in this example we do not have an external IP and can only access the service via the cluster IP:

# kubectl get svc wordpress
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
wordpress   LoadBalancer   10.104.120.47        80:32592/TCP 
  19m
# curl -v http://10.104.120.47
* About to connect() to 10.104.120.47 port 80 (#0)
*   Trying 10.104.120.47...
* Connected to 10.104.120.47 (10.104.120.47) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.104.120.47
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Mon, 28 Jan 2019 16:30:16 GMT
< Server: Apache/2.4.10 (Debian)
< X-Powered-By: PHP/5.6.28
< Expires: Wed, 11 Jan 1984 05:00:00 GMT
< Cache-Control: no-cache, must-revalidate, max-age=0
< Location: http://10.104.120.47/wp-admin/install.php
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host 10.104.120.47 left intact

We can see from the above output we do connect but get a 302 code since Wordpress really needs to be configured first. But it does confirm our applications are up and using the Ceph storageclass.

To clean up the previous exercise lets do the following:

# kubectl delete -f wordpress.yaml
service "wordpress" deleted
persistentvolumeclaim "wp-pv-claim" deleted
deployment.extensions "wordpress" deleted

# kubectl delete -f mysql.yaml
service "wordpress-mysql" deleted
persistentvolumeclaim "mysql-pv-claim" deleted
deployment.apps "wordpress-mysql" deleted

# kubectl delete -n rook-ceph cephblockpools.ceph.rook.io replicapool
cephblockpool.ceph.rook.io "replicapool" deleted

# kubectl delete storageclass rook-ceph-block
storageclass.storage.k8s.io "rook-ceph-block" deleted

The above example was just a simple demonstration of the capabilities Rook/Ceph bring to Kubernetes from a block storage perspective. But leaves one wondering what other possibilities there might be.

Further Reading:

Rook: https://github.com/rook/rook
Kubernetes: https://kubernetes.io/
Previous Article:  https://www.linkedin.com/pulse/deploying-ceph-rook-benjamin-schmaus/