Monday, March 18, 2019

Stacking OpenShift with Rook and CNV


In previous blogs I was working with Rook/Ceph on Kubernetes and demonstrating how to setup a Ceph cluster and even replace failed OSDs. With that in mind I wanted to shift gears a bit and bring it more into alignment with OpenShift and Container Native Virtualization(CNV).

The following blog will guide us through a simple OpenShift deployment with Rook/Ceph and CNV configured. I will also demonstrate the use of a Rook PVC that provides the back end storage for a CNV deployed virtual instance.

The configuration for this lab is four virtual machines where one node is the master and compute and the other 3 nodes compute.  Each of these nodes has a base install of Red Hat Enterprise Linux 7 on it and the physical host they are on allows for nested virtualization.

Before we start with the installation of various software lets make sure we do a bit of user setup to ensure our install runs smoothly.  The next few steps will need to be done on all nodes to ensure a user origin (this could be any non root user) is created and has sudo rights without use of a password:

# useradd origin
# passwd origin
# echo -e 'Defaults:origin !requiretty\norigin ALL = (root) NOPASSWD:ALL' | tee /etc/sudoers.d/openshift 
# chmod 440 /etc/sudoers.d/openshift

Then we need to perform the the following steps to setup keyless authentication for the origin user from the master node to the rest of the nodes that will make up the cluster:

# ssh-keygen -q -N ""
# vi /home/origin/.ssh/config
Host ocp-master
    Hostname ocp-master.schmaustech.com
    User origin
Host ocp-node1
    Hostname ocp-node1.schmaustech.com
    User origin
Host ocp-node2
    Hostname ocp-node2.schmaustech.com
    User origin
Host ocp-node3
    Hostname ocp-node3.schmaustech.com
    User origin

# chmod 600 /home/origin/.ssh/config
# ssh-copy-id ocp-master
# ssh-copy-id ocp-node1
# ssh-copy-id ocp-node2
# ssh-copy-id ocp-node3

Now we can move on to enabling the necessary repositries on all nodes to ensure we can get access to the right packages we will need for installation:

[origin@ocp-master ~]$ sudo subscription-manager repos --enable=rhel-7-server-rpms --enable=rhel-7-server-extras-rpms --enable=rhel-7-server-rh-common-rpms --enable=rhel-7-server-ose-3.11-rpms --enable=rhel-7-server-ansible-2.6-rpms --enable=rhel-7-server-cnv-1.4-tech-preview-rpms

Next lets install the initial required packages on all the nodes:

[origin@ocp-master ~]$ sudo yum -y install openshift-ansible docker-1.13.1 kubevirt-ansible kubevirt-virtctl

On the master node lets configure the Ansible hosts file for our OpenShift installation.   The following is the example I used and I simply replaced /etc/ansible/hosts with it.

[OSEv3:children]
masters
nodes
etcd
[OSEv3:vars]
# admin user created in previous section
ansible_ssh_user=origin
ansible_become=true
oreg_url=registry.access.redhat.com/openshift3/ose-${component}:${version}
openshift_deployment_type=openshift-enterprise
#  use HTPasswd for authentication
openshift_master_identity_providers=[{'name': 'htpasswd_auth', 'login': 'true', 'challenge': 'true', 'kind': 'HTPasswdPasswordIdentityProvider'}]
# define default sub-domain for Master node
openshift_master_default_subdomain=apps.schmaustech.com
# allow unencrypted connection within cluster
openshift_docker_insecure_registries=172.30.0.0/16
[masters]
ocp-master.schmaustech.com openshift_schedulable=true containerized=false
[etcd]
ocp-master.schmaustech.com
[nodes]
# defined values for [openshift_node_group_name] in the file below
# [/usr/share/ansible/openshift-ansible/roles/openshift_facts/defaults/main.yml]
ocp-master.schmaustech.com openshift_node_group_name='node-config-all-in-one'
ocp-node1.schmaustech.com openshift_node_group_name='node-config-compute'
ocp-node2.schmaustech.com openshift_node_group_name='node-config-compute'
ocp-node3.schmaustech.com openshift_node_group_name='node-config-compute'

With the Ansible host file in place we are ready to run the OpenShift prerequisite playbook:

[origin@ocp-master ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/prerequisites.yml

Once the prerequisite playbook executes sucessfully we can then run the OpenShift deploy cluster playbook:

[origin@ocp-master ~]$ ansible-playbook /usr/share/ansible/openshift-ansible/playbooks/deploy_cluster.yml

Lets validate OpenShift is up and running:

[origin@ocp-master ~]$ oc get nodes
NAME         STATUS    ROLES                  AGE       VERSION
ocp-master   Ready     compute,infra,master   15h       v1.11.0+d4cacc0
ocp-node1    Ready     compute                14h       v1.11.0+d4cacc0
ocp-node2    Ready     compute                14h       v1.11.0+d4cacc0
ocp-node3    Ready     compute                14h       v1.11.0+d4cacc0

[origin@ocp-master ~]$ oc get pods --all-namespaces -o wide
NAMESPACE                           NAME                                           READY     STATUS      RESTARTS   AGE       IP              NODE         NOMINATED NODE
default                             docker-registry-1-g4hgd                        1/1       Running     0          14h       10.128.0.4      ocp-master   <none>
default                             registry-console-1-zwhrd                       1/1       Running     0          14h       10.128.0.6      ocp-master   <none>
default                             router-1-v8pkp                                 1/1       Running     0          14h       192.168.3.100   ocp-master   <none>
kube-service-catalog                apiserver-gxjst                                1/1       Running     0          14h       10.128.0.17     ocp-master   <none>
kube-service-catalog                controller-manager-2v6qs                       1/1       Running     3          14h       10.128.0.18     ocp-master   <none>
openshift-ansible-service-broker    asb-1-d8clq                                    1/1       Running     0          14h       10.128.0.21     ocp-master   <none>
openshift-console                   console-566f847459-pk52j                       1/1       Running     0          14h       10.128.0.12     ocp-master   <none>
openshift-monitoring                alertmanager-main-0                            3/3       Running     0          14h       10.128.0.14     ocp-master   <none>
openshift-monitoring                alertmanager-main-1                            3/3       Running     0          14h       10.128.0.15     ocp-master   <none>
openshift-monitoring                alertmanager-main-2                            3/3       Running     0          14h       10.128.0.16     ocp-master   <none>
openshift-monitoring                cluster-monitoring-operator-79d6c544f5-c8rfs   1/1       Running     0          14h       10.128.0.7      ocp-master   <none>
openshift-monitoring                grafana-8497b48bd5-bqzxb                       2/2       Running     0          14h       10.128.0.10     ocp-master   <none>
openshift-monitoring                kube-state-metrics-7d8b57fc8f-ktdq4            3/3       Running     0          14h       10.128.0.19     ocp-master   <none>
openshift-monitoring                node-exporter-5gmbc                            2/2       Running     0          14h       192.168.3.103   ocp-node3    <none>
openshift-monitoring                node-exporter-fxthd                            2/2       Running     0          14h       192.168.3.102   ocp-node2    <none>
openshift-monitoring                node-exporter-gj27b                            2/2       Running     0          14h       192.168.3.101   ocp-node1    <none>
openshift-monitoring                node-exporter-r6vjs                            2/2       Running     0          14h       192.168.3.100   ocp-master   <none>
openshift-monitoring                prometheus-k8s-0                               4/4       Running     1          14h       10.128.0.11     ocp-master   <none>
openshift-monitoring                prometheus-k8s-1                               4/4       Running     1          14h       10.128.0.13     ocp-master   <none>
openshift-monitoring                prometheus-operator-5677fb6f87-4czth           1/1       Running     0          14h       10.128.0.8      ocp-master   <none>
openshift-node                      sync-7rqcb                                     1/1       Running     0          14h       192.168.3.103   ocp-node3    <none>
openshift-node                      sync-829ql                                     1/1       Running     0          14h       192.168.3.101   ocp-node1    <none>
openshift-node                      sync-mwq6v                                     1/1       Running     0          14h       192.168.3.102   ocp-node2    <none>
openshift-node                      sync-vc4hw                                     1/1       Running     0          15h       192.168.3.100   ocp-master   <none>
openshift-sdn                       ovs-n55b8                                      1/1       Running     0          14h       192.168.3.101   ocp-node1    <none>
openshift-sdn                       ovs-nvtgq                                      1/1       Running     0          14h       192.168.3.103   ocp-node3    <none>
openshift-sdn                       ovs-t8dgh                                      1/1       Running     0          14h       192.168.3.102   ocp-node2    <none>
openshift-sdn                       ovs-wgw2v                                      1/1       Running     0          15h       192.168.3.100   ocp-master   <none>
openshift-sdn                       sdn-7r9kn                                      1/1       Running     0          14h       192.168.3.101   ocp-node1    <none>
openshift-sdn                       sdn-89284                                      1/1       Running     0          15h       192.168.3.100   ocp-master   <none>
openshift-sdn                       sdn-hmgjg                                      1/1       Running     0          14h       192.168.3.103   ocp-node3    <none>
openshift-sdn                       sdn-n7lzh                                      1/1       Running     0          14h       192.168.3.102   ocp-node2    <none>
openshift-template-service-broker   apiserver-md5sr                                1/1       Running     0          14h       10.128.0.22     ocp-master   <none>
openshift-web-console               webconsole-674f79b6fc-cjrhw                    1/1       Running     0          14h       10.128.0.9      ocp-master   <none>

With OpenShift up and running we can move onto install Rook/Ceph cluster.  The first step is to clone the Rook Git repo down to the master node and make an adjustment for the kubelet-plugins.  Please note here I am cloning down a colleagues Rook clone and not direct from the Rook project:

[origin@ocp-master ~]$ git clone https://github.com/ksingh7/ocp4-rook.git
[origin@ocp-master ~]$ sed -i.bak s+/etc/kubernetes/kubelet-plugins/volume/exec+/usr/libexec/kubernetes/kubelet-plugins/volume/exec+g /home/origin/ocp4-rook/ceph/operator.yaml

With the repository cloned we can now apply the the security context constraints needed by the Rook pods using the scc.yaml and then launch the Rook operator with operator.yaml:

[origin@ocp-master ~]$ oc create -f /home/origin/ocp4-rook/ceph/scc.yaml
[origin@ocp-master ~]$ oc create -f /home/origin/ocp4-rook/ceph/operator.yaml

Lets validate the Rook operator came up:

[origin@ocp-master ~]$ oc get pods -n rook-ceph-system 
NAME                                 READY     STATUS    RESTARTS   AGE
rook-ceph-agent-77x5n                1/1       Running   0          1h
rook-ceph-agent-cdvqr                1/1       Running   0          1h
rook-ceph-agent-gz7tl                1/1       Running   0          1h
rook-ceph-agent-rsbwh                1/1       Running   0          1h
rook-ceph-operator-b76466dcd-zmscb   1/1       Running   0          1h
rook-discover-6p5ht                  1/1       Running   0          1h
rook-discover-fnrf4                  1/1       Running   0          1h
rook-discover-grr5w                  1/1       Running   0          1h
rook-discover-mllt7                  1/1       Running   0          1h

Once the operator is up we can proceed on deploying the Ceph cluster and once that is up deploy the Ceph toolbox pod:

[origin@ocp-master ~]$ oc create -f /home/origin/ocp4-rook/ceph/cluster.yaml  
[origin@ocp-master ~]$ oc create -f /home/origin/ocp4-rook/ceph/toolbox.yaml

Lets validate the Ceph cluster is up:

[origin@ocp-master ~]$ oc get pods -n rook-ceph
NAME                                     READY     STATUS      RESTARTS   AGE
rook-ceph-mgr-a-785ddd6d6c-d4w56         1/1       Running     0          1h
rook-ceph-mon-a-67855c796b-sdvqm         1/1       Running     0          1h
rook-ceph-mon-b-6d58cd7656-xkrdz         1/1       Running     0          1h
rook-ceph-mon-c-869b8d9d9-m7544          1/1       Running     0          1h
rook-ceph-osd-0-d6cbd5776-987p9          1/1       Running     0          1h
rook-ceph-osd-1-cfddf997-pzq69           1/1       Running     0          1h
rook-ceph-osd-2-79fc94c6d5-krtnj         1/1       Running     0          1h
rook-ceph-osd-3-f9b55c4d6-7jp7c          1/1       Running     0          1h
rook-ceph-osd-prepare-ocp-master-ztmhs   0/2       Completed   0          1h
rook-ceph-osd-prepare-ocp-node1-mgbcd    0/2       Completed   0          1h
rook-ceph-osd-prepare-ocp-node2-98rtw    0/2       Completed   0          1h
rook-ceph-osd-prepare-ocp-node3-ngscg    0/2       Completed   0          1h
rook-ceph-tools                          1/1       Running     0          1h

Lets also validate from the Ceph toolbox that the cluster health is ok:

[origin@ocp-master ~]$ oc -n rook-ceph rsh rook-ceph-tools
sh-4.2# ceph status
  cluster:
    id:     6ddab3e4-1730-412f-89b8-0738708adac8
    health: HEALTH_OK
 
  services:
    mon: 3 daemons, quorum b,a,c
    mgr: a(active)
    osd: 4 osds: 4 up, 4 in
 
  data:
    pools:   1 pools, 100 pgs
    objects: 281  objects, 1.1 GiB
    usage:   51 GiB used, 169 GiB / 220 GiB avail
    pgs:     100 active+clean


Now that we have confirmed the Ceph cluster is deployed lets configure a Ceph storage class and also make it the default storage class for the environment:

[origin@ocp-master ~]$ oc create -f /home/origin/ocp4-rook/ceph/storageclass.yaml
[origin@ocp-master ~]$ oc patch storageclass rook-ceph-block -p '{"metadata": {"annotations":{"storageclass.kubernetes.io/is-default-class":"true"}}}'

And now if we display the storage class we can see Rook/Ceph is our default:

[origin@ocp-master ~]$ oc get storageclass
NAME                        PROVISIONER          AGE
rook-ceph-block (default)   ceph.rook.io/block   6h


Proceeding with our stack installation lets get CNV installed.  Again with the use of the Ansible playbook we used earlier for OpenShift this is a relatively easy task:

[origin@ocp-master ~]$ oc login -u system:admin
[origin@ocp-master ~]$ cd /usr/share/ansible/kubevirt-ansible
[origin@ocp-master ~]$ ansible-playbook -i /etc/ansible/hosts -e @vars/cnv.yml playbooks/kubevirt.yml -e apb_action=provision

Once the installation completes lets run the following command to ensure the pods for CNV are up:

[origin@ocp-master ~]$ oc get pods --all-namespaces -o wide|egrep "kubevirt|cdi"
cdi                                 cdi-apiserver-7bfd97d585-tqjgt                 1/1       Running     0          6h        10.129.0.11     ocp-node3    
cdi                                 cdi-deployment-6689fcb476-4klcj                1/1       Running     0          6h        10.131.0.12     ocp-node1    
cdi                                 cdi-operator-5889d7588c-wvgl4                  1/1       Running     0          6h        10.130.0.12     ocp-node2    
cdi                                 cdi-uploadproxy-79c9fb9f59-pkskw               1/1       Running     0          6h        10.129.0.13     ocp-node3    
cdi                                 virt-launcher-f29vm-h6mc9                      1/1       Running     0          6h        10.129.0.15     ocp-node3    
kubevirt-web-ui                     console-854d4585c8-hgdhv                       1/1       Running     0          6h        10.129.0.10     ocp-node3    
kubevirt-web-ui                     kubevirt-web-ui-operator-6b4574bb95-bmsw7      1/1       Running     0          6h        10.130.0.11     ocp-node2    
kubevirt                            kubevirt-cpu-node-labeller-fvx9n               1/1       Running     0          6h        10.128.0.29     ocp-master   
kubevirt                            kubevirt-cpu-node-labeller-jr858               1/1       Running     0          6h        10.131.0.13     ocp-node1    
kubevirt                            kubevirt-cpu-node-labeller-tgq5g               1/1       Running     0          6h        10.129.0.14     ocp-node3    
kubevirt                            kubevirt-cpu-node-labeller-xqpbl               1/1       Running     0          6h        10.130.0.13     ocp-node2    
kubevirt                            virt-api-865b95d544-hg58l                      1/1       Running     0          6h        10.129.0.8      ocp-node3    
kubevirt                            virt-api-865b95d544-jrkxh                      1/1       Running     0          6h        10.131.0.10     ocp-node1    
kubevirt                            virt-controller-5c89d4978d-q79lh               1/1       Running     0          6h        10.130.0.8      ocp-node2    
kubevirt                            virt-controller-5c89d4978d-t58l7               1/1       Running     0          6h        10.130.0.10     ocp-node2    
kubevirt                            virt-handler-gblbk                             1/1       Running     0          6h        10.128.0.28     ocp-master   
kubevirt                            virt-handler-jnwx6                             1/1       Running     0          6h        10.130.0.9      ocp-node2    
kubevirt                            virt-handler-r94fb                             1/1       Running     0          6h        10.129.0.9      ocp-node3    
kubevirt                            virt-handler-z7775                             1/1       Running     0          6h        10.131.0.11     ocp-node1    
kubevirt                            virt-operator-68984b585c-265bq                 1/1       Running     0          6h        10.129.0.7      ocp-node3    

Now that CNV is up running lets pull down a Fedora 29 image and upload it into a PVC of the default storageclass which of course is Rook/Ceph:

[origin@ocp-master ~]$ curl -L -o /home/origin/f29.qcow2 http://ftp.usf.edu/pub/fedora/linux/releases/29/Cloud/x86_64/images/Fedora-Cloud-Base-29-1.2.x86_64.qcow2
[origin@ocp-master ~]$ virtctl image-upload --pvc-name=f29vm --pvc-size=5Gi --image-path=/home/origin/f29.qcow2 --uploadproxy-url=https://`oc describe route cdi-uploadproxy-route|grep Endpoints|cut -f2` --insecure

We can execute the following to see that the PVC has been created:

[origin@ocp-master ~]$ oc get pvc
NAME      STATUS    VOLUME                                     CAPACITY   ACCESS MODES   STORAGECLASS      AGE
f29vm     Bound     pvc-4815df9e-4987-11e9-a732-525400767d62   5Gi        RWO            rook-ceph-block   6h

Besides the PVC we will also need a virtual machine configuration yaml file.  The one below is an example that will be used in this demonstration:

apiVersion: kubevirt.io/v1alpha3
kind: VirtualMachine
metadata:
  creationTimestamp: null
  labels:
    kubevirt-vm: f29vm
  name: f29vm
spec:
  running: true
  template:
    metadata:
      creationTimestamp: null
      labels:
        kubevirt.io/domain: f29vm
    spec:
      domain:
        cpu:
          cores: 2
        devices:
          disks:
          - disk:
              bus: virtio
            name: osdisk
            volumeName: osdisk
          - disk:
              bus: virtio
            name: cloudinitdisk
            volumeName: cloudinitvolume
          interfaces:
          - name: default
            bridge: {}
        resources:
          requests:
            memory: 1024M
      terminationGracePeriodSeconds: 0
      volumes:
      - name: osdisk
        persistentVolumeClaim:
          claimName: f29vm
      - name: cloudinitdisk
        cloudInitNoCloud:
          userData: |-
            #cloud-config
            password: ${PASSWORD}
            disable_root: false
            chpasswd: { expire: False }
            ssh_authorized_keys:
            - "ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAABAQDUs1KbLraX74mBM/ksoGwbsEejfpCVeMzbW7JLJjGXF8G1jyVAE3T0Uf5mO8nbNOfkjAjw24lxSsEScF2wslBzA5MIm+GB6Z+ZzR55FcRlZeouGVrfLmb67mYc2c/F/mq35TruHdRk2G5Y0+6cf8cfDs414+yiVA0heHQvWNfO7kb1z9kIOhyD6OOwdNT5jK/1O0+p6SdP+pEal51BsEf6GRGYLWc9SLIEcqtjoprnundr5UPvmC1l/pkqFQigMehwhthrdXC4GseWiyj9CnBkccxQCKvHjzko/wqsWGQLwDG3pBsHhthvbY0G5+VPB9a8YV58WJhC6nHpUTDA8jpB origin@ocp-master"
      networks:
      - name: default
        pod: {}

At this point we have all the necessary components to launch our containerized virtual machine instance.   The following command does the creation using the yaml file we created in the previous step:

[origin@ocp-master ~]$ oc create -f /home/origin/f29vm.yaml

There are multiple ways to validate the virtual machine has been instantiated.   I like to do the following to confirm the instance is running and has an IP address:

[origin@ocp-master ~]$ oc get vms
NAME      AGE       RUNNING   VOLUME
f29vm     6h        true      
[origin@ocp-master ~]$ oc get vmi
NAME      AGE       PHASE     IP            NODENAME
f29vm     6h        Running   10.129.0.15   ocp-node3

One final step you can do is actually log into the instance assuming a key was set in the yaml file:

[origin@ocp-master ~]$ ssh -i /home/origin/.ssh/id_rsa -o "StrictHostKeyChecking=no" fedora@10.129.0.15
[fedora@f29vm ~]$ cat /etc/fedora-release
Fedora release 29 (Twenty Nine)

Hopefully this demonstrated how easy it is to get OpenShift, Rook and CNV up and running and how one can then leverage the storage of Rook to provide a backend for the virtual instance that gets spun up in CNV.   What is awesome is that I have taken the steps above and put them into a DCI job where I can automatically rerun the deployment using newer version of the code base for testing.   If you are not familiar with DCI I will leave with this tease link to DCI: https://doc.distributed-ci.io/