SCHMAUSTECH: Rook & Ceph on Kubernetes

In a previous article I wrote about using Rook to deploy a Ceph storage cluster within Minikube (link below). The original post described what Rook can provide and demonstrated the ease of quickly setting up an all in one Ceph cluster. However I wanted explore Rook further in a multi-node configuration and how it integrates with applications in Kubernetes.

First I needed to set up a base Kubernetes environment which consisted of 1 master and 3 worker nodes. I used the following steps on all nodes to prepare them for Kubernetes: add hostname to host files, disable Selinux and swap, enable br_netfilter, install supporting utilities, enable Kubernetes repo, install docker, install Kubernetes binaries and enable/disable relevant services.

# echo "10.0.0.81   kube-master" >> /etc/hosts
# echo "10.0.0.82   kube-node1" >> /etc/hosts
# echo "10.0.0.83   kube-node2" >> /etc/hosts
# echo "10.0.0.84   kube-node3" >> /etc/hosts
# setenforce 0
# sed -i --follow-symlinks 's/SELINUX=enforcing/SELINUX=disabled/g' /etc/sysconfig/selinux
# swapoff -a
# sed -i.bak -r 's/(.+ swap .+)/#\1/' /etc/fstab
# modprobe br_netfilter
# echo '1' > /proc/sys/net/bridge/bridge-nf-call-iptables
# echo 'br_netfilter' > /etc/modules-load.d/netfilter.conf
# echo net.bridge.bridge-nf-call-iptables=1 >> /etc/sysctl.d/10-bridge-nf-call-iptables.conf
# dnf install -y yum-utils device-mapper-persistent-data lvm2
# dnf install docker
# cat > /etc/yum.repos.d/kubernetes.repo < [kubernetes]
  > name=Kubernetes
  > baseurl=https://packages.cloud.google.com/yum/repos/kubernetes-el7-x86_64
  > enabled=1
  > gpgcheck=1
  > repo_gpgcheck=1
  > gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg
  >         https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg
  > EOF
# dnf install -y kubelet kubeadm kubectl
# systemctl enable docker ; systemctl start docker ; systemctl enable kubelet ; systemctl start kubelet ; systemctl stop firewalld ; systemctl disable firewalld

Once the prerequisites are met on each node lets initialize the cluster on the master node:

# kubeadm init --apiserver-advertise-address=10.0.0.81 --pod-network-cidr=10.244.0.0/16
[init] Using Kubernetes version: v1.13.2
[preflight] Running pre-flight checks
[preflight] Pulling images required for setting up a Kubernetes cluster
[preflight] This might take a minute or two, depending on the speed of your internet connection
[preflight] You can also perform this action in beforehand using 'kubeadm config images pull'
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Activating the kubelet service
[certs] Using certificateDir folder "/etc/kubernetes/pki"
[certs] Generating "etcd/ca" certificate and key
[certs] Generating "etcd/healthcheck-client" certificate and key
[certs] Generating "apiserver-etcd-client" certificate and key
[certs] Generating "etcd/server" certificate and key
[certs] etcd/server serving cert is signed for DNS names [kube-master localhost] and IPs [10.0.0.81 127.0.0.1 ::1]
[certs] Generating "etcd/peer" certificate and key
[certs] etcd/peer serving cert is signed for DNS names [kube-master localhost] and IPs [10.0.0.81 127.0.0.1 ::1]
[certs] Generating "ca" certificate and key
[certs] Generating "apiserver-kubelet-client" certificate and key
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kube-master kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.0.0.81]
[certs] Generating "front-proxy-ca" certificate and key
[certs] Generating "front-proxy-client" certificate and key
[certs] Generating "sa" key and public key
[kubeconfig] Using kubeconfig folder "/etc/kubernetes"
[kubeconfig] Writing "admin.conf" kubeconfig file
[kubeconfig] Writing "kubelet.conf" kubeconfig file
[kubeconfig] Writing "controller-manager.conf" kubeconfig file
[kubeconfig] Writing "scheduler.conf" kubeconfig file
[control-plane] Using manifest folder "/etc/kubernetes/manifests"
[control-plane] Creating static Pod manifest for "kube-apiserver"
[control-plane] Creating static Pod manifest for "kube-controller-manager"
[control-plane] Creating static Pod manifest for "kube-scheduler"
[etcd] Creating static Pod manifest for local etcd in "/etc/kubernetes/manifests"
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[apiclient] All control plane components are healthy after 19.511836 seconds
[uploadconfig] storing the configuration used in ConfigMap "kubeadm-config" in the "kube-system" Namespace
[kubelet] Creating a ConfigMap "kubelet-config-1.13" in namespace kube-system with the configuration for the kubelets in the cluster
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-master" as an annotation
[mark-control-plane] Marking the node kube-master as control-plane by adding the label "node-role.kubernetes.io/master=''"
[mark-control-plane] Marking the node kube-master as control-plane by adding the taints [node-role.kubernetes.io/master:NoSchedule]
[bootstrap-token] Using token: etmucm.238nrw6a48yu0njb
[bootstrap-token] Configuring bootstrap tokens, cluster-info ConfigMap, RBAC Roles
[bootstraptoken] configured RBAC rules to allow Node Bootstrap tokens to post CSRs in order for nodes to get long term certificate credentials
[bootstraptoken] configured RBAC rules to allow the csrapprover controller automatically approve CSRs from a Node Bootstrap Token
[bootstraptoken] configured RBAC rules to allow certificate rotation for all node client certificates in the cluster
[bootstraptoken] creating the "cluster-info" ConfigMap in the "kube-public" namespace
[addons] Applied essential addon: CoreDNS
[addons] Applied essential addon: kube-proxy

Your Kubernetes master has initialized successfully!

To start using your cluster, you need to run the following as a regular user:

  mkdir -p $HOME/.kube
  sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
  sudo chown $(id -u):$(id -g) $HOME/.kube/config

You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
  https://kubernetes.io/docs/concepts/cluster-administration/addons/

You can now join any number of machines by running the following on each node
as root:

  kubeadm join 10.0.0.81:6443 --token etmucm.238nrw6a48yu0njb --discovery-token-ca-cert-hash sha256:963d6d9d31f2db9debfaa600ef802d05c448f7dc9e9cb92aec268cf2a8cfee7b

After the master is up and running you can join the remaining nodes using the following command which was presented in the output when you initialized the master:

# kubeadm join 10.0.0.81:6443 --token etmucm.238nrw6a48yu0njb --discovery-token-ca-cert-hash sha256:963d6d9d31f2db9debfaa600ef802d05c448f7dc9e9cb92aec268cf2a8cfee7b
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "10.0.0.81:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.0.81:6443"
[discovery] Requesting info from "https://10.0.0.81:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.0.0.81:6443"
[discovery] Successfully established connection with API Server "10.0.0.81:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet] Downloading configuration for the kubelet from the "kubelet-config-1.13" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Activating the kubelet service
[tlsbootstrap] Waiting for the kubelet to perform the TLS Bootstrap...
[patchnode] Uploading the CRI Socket information "/var/run/dockershim.sock" to the Node API object "kube-node1" as an annotation

This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.

Run 'kubectl get nodes' on the master to see this node join the cluster.

I like to do some housekeeping once all my nodes are joined which includes enabling scheduling on master and labeling the worker nodes as such:

# kubectl taint node kube-master node-role.kubernetes.io/master:NoSchedule-
# kubectl label node kube-node1 node-role.kubernetes.io/worker=worker
# kubectl label node kube-node2 node-role.kubernetes.io/worker=worker
# kubectl label node kube-node3 node-role.kubernetes.io/worker=worker

Once you have joined the nodes you should have a cluster that looks like this:

# kubectl get nodes
NAME          STATUS   ROLES    AGE   VERSION
kube-master   Ready    master   19h   v1.13.2
kube-node1    Ready    worker   19h   v1.13.2
kube-node2    Ready    worker   19h   v1.13.2
kube-node3    Ready    worker   17h   v1.13.2

Next lets deploy Flannel for networking:

# kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

And finally lets deploy Rook and Ceph cluster using the familiar steps from my previous article:

# git clone https://github.com/rook/rook.git
# cd ./rook/cluster/examples/kubernetes/ceph
# sed -i.bak s+/var/lib/rook+/data/rook+g cluster.yaml
# kubectl create -f operator.yaml
# kubectl create -f cluster.yaml
# kubectl create -f toolbox.yaml

Once all the containers have spun up you should have something that looks like the following:

# kubectl get pod -n rook-ceph -o wide
NAME                                      READY   STATUS      RESTARTS   AGE   IP            NODE          NOMINATED NODE   READINESS GATES
rook-ceph-mgr-a-8649f78d9b-txsfm          1/1     Running     1          19h   10.244.2.12   kube-node2               
rook-ceph-mon-a-598b7bd4cd-kpxnx          1/1     Running     0          19h   10.244.0.3    kube-master              
rook-ceph-mon-c-759b8984f5-ggzjb          1/1     Running     1          19h   10.244.2.15   kube-node2               
rook-ceph-mon-d-77d55dcddf-mwnf8          1/1     Running     0          16h   10.244.3.3    kube-node3               
rook-ceph-osd-0-77b448bbcc-mdhsw          1/1     Running     1          19h   10.244.2.14   kube-node2               
rook-ceph-osd-1-65db4b7c5d-hgfcj          1/1     Running     0          16h   10.244.1.8    kube-node1               
rook-ceph-osd-2-5b475cb56c-x5w6n          1/1     Running     0          19h   10.244.0.5    kube-master              
rook-ceph-osd-3-657789944d-swjxd          1/1     Running     0          16h   10.244.3.6    kube-node3               
rook-ceph-osd-prepare-kube-master-tlhxf   0/2     Completed   0          16h   10.244.0.6    kube-master              
rook-ceph-osd-prepare-kube-node1-lgtrf    0/2     Completed   0          16h   10.244.1.12   kube-node1               
rook-ceph-osd-prepare-kube-node2-5tbt6    0/2     Completed   0          16h   10.244.2.17   kube-node2               
rook-ceph-osd-prepare-kube-node3-rrp4z    0/2     Completed   0          16h   10.244.3.5    kube-node3               
rook-ceph-tools-76c7d559b6-7kprh          1/1     Running     0          16h   10.0.0.84     kube-node3

And of course we can validate the Ceph cluster is up and healthy via the toolbox container as well:

# kubectl -n rook-ceph exec -it $(kubectl -n rook-ceph get pod -l "app=rook-ceph-tools" -o jsonpath='{.items[0].metadata.name}') bash

# ceph status
  cluster:
    id:     4be6e204-3d82-4cc4-9ea4-57f0e71f99c5
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum d,a,c
    mgr: a(active)
    osd: 4 osds: 4 up, 4 in
 
  data:
    pools:   0 pools, 0 pgs
    objects: 0  objects, 0 B
    usage:   17 GiB used, 123 GiB / 140 GiB avail
    pgs:     
 
# ceph osd tree
ID CLASS WEIGHT  TYPE NAME            STATUS REWEIGHT PRI-AFF
-1       0.13715 root default                                 
-4       0.03429     host kube-master                         
 2   hdd 0.03429         osd.2            up  1.00000 1.00000
-3       0.03429     host kube-node1                          
 1   hdd 0.03429         osd.1            up  1.00000 1.00000
-2       0.03429     host kube-node2                          
 0   hdd 0.03429         osd.0            up  1.00000 1.00000
-9       0.03429     host kube-node3                          
 3   hdd 0.03429         osd.3            up  1.00000 1.00000

Everything we have done up to this point has been very similar to what I did in the previous article with Minikube except instead of a single node we have a multiple node configuration. Now lets take it a step further and get an application to use our Ceph storage cluster.

The first step in Kubernetes will be to created a storageclass.yaml that uses Ceph. Populate the storageclass.yaml with the following:

apiVersion: ceph.rook.io/v1
kind: CephBlockPool
metadata:
  name: replicapool
  namespace: rook-ceph
spec:
  failureDomain: host
  replicated:
    size: 3
---
apiVersion: storage.k8s.io/v1
kind: StorageClass
metadata:
   name: rook-ceph-block
provisioner: ceph.rook.io/block
parameters:
  blockPool: replicapool
  # The value of "clusterNamespace" MUST be the same as the one in which your rook cluster exist
  clusterNamespace: rook-ceph
  # Specify the filesystem type of the volume. If not specified, it will use `ext4`.
  fstype: xfs
# Optional, default reclaimPolicy is "Delete". Other options are: "Retain", "Recycle" as documented in https://kubernetes.io/docs/concepts/storage/storage-classes/

Next lets create the storage class using the yaml we created and set it to default:

# kubectl create -f storageclass.yaml
cephblockpool.ceph.rook.io/replicapool created
storageclass.storage.k8s.io/rook-ceph-block created

# kubectl get storageclass
NAME              PROVISIONER          AGE
rook-ceph-block   ceph.rook.io/block   61s

# kubectl patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'
storageclass.storage.k8s.io/rook-ceph-block patched

# kubectl get storageclass
NAME                        PROVISIONER          AGE
rook-ceph-block (default)   ceph.rook.io/block   3m30s

Now that we have a storageclass that uses Ceph as the backend we now need an application to consume the storageclass. Thankfully the Rook git repo includes a couple of examples: Wordpress and MySQL. Lets go ahead and create those apps doing the following:

# cd ./rook/cluster/examples/kubernetes

# kubectl create -f mysql.yaml
service/wordpress-mysql created
persistentvolumeclaim/mysql-pv-claim created
deployment.apps/wordpress-mysql created

# kubectl create -f wordpress.yaml
service/wordpress created
persistentvolumeclaim/wp-pv-claim created
deployment.extensions/wordpress created

We can confirm our two applications are running by the following:

# kubectl get pods -n default -o wide
NAME                               READY   STATUS    RESTARTS   AGE     IP            NODE         NOMINATED NODE   READINESS GATES
wordpress-7b6c4c79bb-7b4dq         1/1     Running   0          68s     10.244.1.14   kube-node1              
wordpress-mysql-6887bf844f-2m4h4   1/1     Running   0          2m47s   10.244.1.13   kube-node1

Now lets confirm if they are actually using our Ceph storageclass:

# kubectl get pvc

NAME             STATUS   VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
mysql-pv-claim   Bound    pvc-0c0be0ec-2317-11e9-a462-5254003ede95   20Gi       RWO            rook-ceph-block   3m36s
wp-pv-claim      Bound    pvc-46b4b266-2317-11e9-a462-5254003ede95   20Gi       RWO            rook-ceph-block   118s

And lets also confirm Wordpress is up and running from a user perspective. Note in this example we do not have an external IP and can only access the service via the cluster IP:

# kubectl get svc wordpress
NAME        TYPE           CLUSTER-IP      EXTERNAL-IP   PORT(S)        AGE
wordpress   LoadBalancer   10.104.120.47        80:32592/TCP 
  19m
# curl -v http://10.104.120.47
* About to connect() to 10.104.120.47 port 80 (#0)
*   Trying 10.104.120.47...
* Connected to 10.104.120.47 (10.104.120.47) port 80 (#0)
> GET / HTTP/1.1
> User-Agent: curl/7.29.0
> Host: 10.104.120.47
> Accept: */*
>
< HTTP/1.1 302 Found
< Date: Mon, 28 Jan 2019 16:30:16 GMT
< Server: Apache/2.4.10 (Debian)
< X-Powered-By: PHP/5.6.28
< Expires: Wed, 11 Jan 1984 05:00:00 GMT
< Cache-Control: no-cache, must-revalidate, max-age=0
< Location: http://10.104.120.47/wp-admin/install.php
< Content-Length: 0
< Content-Type: text/html; charset=UTF-8
<
* Connection #0 to host 10.104.120.47 left intact

We can see from the above output we do connect but get a 302 code since Wordpress really needs to be configured first. But it does confirm our applications are up and using the Ceph storageclass.

To clean up the previous exercise lets do the following:

# kubectl delete -f wordpress.yaml
service "wordpress" deleted
persistentvolumeclaim "wp-pv-claim" deleted
deployment.extensions "wordpress" deleted

# kubectl delete -f mysql.yaml
service "wordpress-mysql" deleted
persistentvolumeclaim "mysql-pv-claim" deleted
deployment.apps "wordpress-mysql" deleted

# kubectl delete -n rook-ceph cephblockpools.ceph.rook.io replicapool
cephblockpool.ceph.rook.io "replicapool" deleted

# kubectl delete storageclass rook-ceph-block
storageclass.storage.k8s.io "rook-ceph-block" deleted

The above example was just a simple demonstration of the capabilities Rook/Ceph bring to Kubernetes from a block storage perspective. But leaves one wondering what other possibilities there might be.

Further Reading:

Rook: https://github.com/rook/rook
Kubernetes: https://kubernetes.io/
Previous Article: https://www.linkedin.com/pulse/deploying-ceph-rook-benjamin-schmaus/

Wednesday, January 30, 2019

Rook & Ceph on Kubernetes