Recently a colleague of mine was trying to get Rook to deploy a Ceph cluster that used dedicated public and private networks to segment the Ceph replication traffic and the client access traffic to the OSDs of the cluster. In a regular Ceph deployment this is rather trivial but when in the context of Kubernetes it becomes a little more complex given that Rook is deploying the cluster containers. The following is procedure I applied to ensure my OSDs were listening on the appropriate networks.
Before we get into the steps on how to achieve this configuration lets quick take a look at the setup I used. First I have a three node Kubernetes configuration (1 master with allowed scheduling and two workers):
# kubectl get nodes NAME STATUS ROLES AGE VERSION kube-master Ready master 2d22h v1.14.0 kube-node1 Ready worker 2d22h v1.14.0 kube-node2 Ready worker 2d22h v1.14.0
On each of the nodes I have 3 network interfaces: eth0 on 10.0.0.0/24 (Kubernetes public), eth1 on 192.168.100.0/24 (Ceph private/cluster) & eth2 on 192.168.200.0/24 (Ceph public):
# ip a|grep eth[0-2] 2: eth0:mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 10.0.0.81/24 brd 10.0.0.255 scope global noprefixroute eth0 3: eth1: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.100.81/24 brd 192.168.100.255 scope global noprefixroute eth1 4: eth2: mtu 1500 qdisc pfifo_fast state UP group default qlen 1000 inet 192.168.200.81/24 brd 192.168.200.255 scope global noprefixroute eth2
Before we begin lets see the current vanilla pods and namespaces on the Kubernetes cluster:
# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-fb8b8dccf-h6wfn 1/1 Running 0 3d 10.244.1.2 kube-node2kube-system coredns-fb8b8dccf-mv7p5 1/1 Running 0 3d 10.244.0.7 kube-master kube-system etcd-kube-master 1/1 Running 0 3d 10.0.0.81 kube-master kube-system kube-apiserver-kube-master 1/1 Running 0 3d 10.0.0.81 kube-master kube-system kube-controller-manager-kube-master 1/1 Running 1 3d 10.0.0.81 kube-master kube-system kube-flannel-ds-amd64-szhg9 1/1 Running 0 3d 10.0.0.83 kube-node2 kube-system kube-flannel-ds-amd64-t4fxs 1/1 Running 0 3d 10.0.0.82 kube-node1 kube-system kube-flannel-ds-amd64-wbsdp 1/1 Running 0 3d 10.0.0.81 kube-master kube-system kube-proxy-sn7j7 1/1 Running 0 3d 10.0.0.83 kube-node2 kube-system kube-proxy-wtzm5 1/1 Running 0 3d 10.0.0.81 kube-master kube-system kube-proxy-xlwd9 1/1 Running 0 3d 10.0.0.82 kube-node1 kube-system kube-scheduler-kube-master 1/1 Running 1 3d 10.0.0.81 kube-master # kubectl get ns NAME STATUS AGE default Active 3d kube-node-lease Active 3d kube-public Active 3d kube-system Active 3d
Before can deploy the cluster we need to create a configmap for the rook-ceph namespace. This namespace is normally created when the cluster is deployed however we want specific configuration items to be incorporated into the cluster upon deployment and so to do this we will create the rook-ceph namespace and apply a configmap that we create to that namespace.
First create a configmap file that looks like the following and notice I am referencing my Ceph cluster networks. I will save this file with an arbitrary name like config-override.yaml
apiVersion: v1 kind: ConfigMap metadata: name: rook-config-override namespace: rook-ceph data: config: | [global] public network = 192.168.200.0/24 cluster network = 192.168.100.0/24 public addr = "" cluster addr = ""
Next I will create the rook-ceph namespace:
# kubectl create namespace rook-ceph namespace/rook-ceph created
# kubectl get ns NAME STATUS AGE default Active 3d1h kube-node-lease Active 3d1h kube-public Active 3d1h kube-system Active 3d1h rook-ceph Active 5s
Now we can apply the configmap we created to the newly created namespace and validate its there:
# kubectl create -f config-override.yaml configmap/rook-config-override created
# kubectl get configmap -n rook-ceph NAME DATA AGE rook-config-override 1 66s
# kubectl describe configmap -n rook-ceph Name: rook-config-override Namespace: rook-ceph Labels: <none> Annotations: <none> Data ==== config: ---- [global] public network = 192.168.200.0/24 cluster network = 192.168.100.0/24 public addr = "" cluster addr = "" Events: <none>
Before we actually start to do the deploy we need to update one more thing in our Rook cluster.yaml. Inside the cluster.yaml file we need to change hostNetwork from the default of false to true:
sed -i 's/hostNetwork: false/hostNetwork: true/g' cluster.yaml
Now we can begin the process of deploying the Rook/Ceph cluster that includes launching the operator, cluster and toolbox. I will place sleep statements in between each command to ensure the pods are up before I run the next command. Also note there will be an error when creating the cluster about the rook-ceph namespace already existing and this is normal:
# kubectl create -f operator.yaml namespace/rook-ceph-system created customresourcedefinition.apiextensions.k8s.io/cephclusters.ceph.rook.io created customresourcedefinition.apiextensions.k8s.io/cephfilesystems.ceph.rook.io created customresourcedefinition.apiextensions.k8s.io/cephobjectstores.ceph.rook.io created customresourcedefinition.apiextensions.k8s.io/cephobjectstoreusers.ceph.rook.io created customresourcedefinition.apiextensions.k8s.io/cephblockpools.ceph.rook.io created customresourcedefinition.apiextensions.k8s.io/volumes.rook.io created clusterrole.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created role.rbac.authorization.k8s.io/rook-ceph-system created clusterrole.rbac.authorization.k8s.io/rook-ceph-global created clusterrole.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created serviceaccount/rook-ceph-system created rolebinding.rbac.authorization.k8s.io/rook-ceph-system created clusterrolebinding.rbac.authorization.k8s.io/rook-ceph-global created deployment.apps/rook-ceph-operator created
# sleep 60
# kubectl create -f cluster.yaml serviceaccount/rook-ceph-osd created serviceaccount/rook-ceph-mgr created role.rbac.authorization.k8s.io/rook-ceph-osd created role.rbac.authorization.k8s.io/rook-ceph-mgr-system created role.rbac.authorization.k8s.io/rook-ceph-mgr created rolebinding.rbac.authorization.k8s.io/rook-ceph-cluster-mgmt created rolebinding.rbac.authorization.k8s.io/rook-ceph-osd created rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr created rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-system created rolebinding.rbac.authorization.k8s.io/rook-ceph-mgr-cluster created cephcluster.ceph.rook.io/rook-ceph created Error from server (AlreadyExists): error when creating "cluster.yaml": namespaces "rook-ceph" already exists
# sleep 60
# kubectl create -f toolbox.yaml pod/rook-ceph-tools created
Lets validate the Rook/Ceph operator, cluster and toolbox is up and running:
# kubectl get pods --all-namespaces -o wide NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES kube-system coredns-fb8b8dccf-h6wfn 1/1 Running 0 3d1h 10.244.1.2 kube-node2 <none> <none> kube-system coredns-fb8b8dccf-mv7p5 1/1 Running 0 3d1h 10.244.0.7 kube-master <none> <none> kube-system etcd-kube-master 1/1 Running 0 3d1h 10.0.0.81 kube-master <none> <none> kube-system kube-apiserver-kube-master 1/1 Running 0 3d1h 10.0.0.81 kube-master <none> <none> kube-system kube-controller-manager-kube-master 1/1 Running 1 3d1h 10.0.0.81 kube-master <none> <none> kube-system kube-flannel-ds-amd64-szhg9 1/1 Running 0 3d1h 10.0.0.83 kube-node2 <none> <none> kube-system kube-flannel-ds-amd64-t4fxs 1/1 Running 0 3d1h 10.0.0.82 kube-node1 <none> <none> kube-system kube-flannel-ds-amd64-wbsdp 1/1 Running 0 3d1h 10.0.0.81 kube-master <none> <none> kube-system kube-proxy-sn7j7 1/1 Running 0 3d1h 10.0.0.83 kube-node2 <none> <none> kube-system kube-proxy-wtzm5 1/1 Running 0 3d1h 10.0.0.81 kube-master <none> <none> kube-system kube-proxy-xlwd9 1/1 Running 0 3d1h 10.0.0.82 kube-node1 <none> <none> kube-system kube-scheduler-kube-master 1/1 Running 1 3d1h 10.0.0.81 kube-master <none> <none> rook-ceph-system rook-ceph-agent-55fqp 1/1 Running 0 17m 10.0.0.83 kube-node2 <none> <none> rook-ceph-system rook-ceph-agent-5v9v5 1/1 Running 0 17m 10.0.0.81 kube-master <none> <none> rook-ceph-system rook-ceph-agent-spx29 1/1 Running 0 17m 10.0.0.82 kube-node1 <none> <none> rook-ceph-system rook-ceph-operator-57547fc866-ltp8z 1/1 Running 0 18m 10.244.2.4 kube-node1 <none> <none> rook-ceph-system rook-discover-brxmt 1/1 Running 0 17m 10.244.2.5 kube-node1 <none> <none> rook-ceph-system rook-discover-hl748 1/1 Running 0 17m 10.244.1.8 kube-node2 <none> <none> rook-ceph-system rook-discover-qj5kd 1/1 Running 0 17m 10.244.0.9 kube-master <none> <none> rook-ceph rook-ceph-mgr-a-5dbb44d7f8-vzs46 1/1 Running 0 16m 10.0.0.82 kube-node1 <none> <none> rook-ceph rook-ceph-mon-a-5fb9568cb4-gvqln 1/1 Running 0 16m 10.0.0.81 kube-master <none> <none> rook-ceph rook-ceph-mon-b-b65c555bf-vz7ps 1/1 Running 0 16m 10.0.0.82 kube-node1 <none> <none> rook-ceph rook-ceph-mon-c-69cf744c4d-8g4l6 1/1 Running 0 16m 10.0.0.83 kube-node2 <none> <none> rook-ceph rook-ceph-osd-0-77499f547-d2vjx 1/1 Running 0 15m 10.0.0.81 kube-master <none> <none> rook-ceph rook-ceph-osd-1-698f76d786-lqn4w 1/1 Running 0 15m 10.0.0.82 kube-node1 <none> <none> rook-ceph rook-ceph-osd-2-558c59d577-wfdlr 1/1 Running 0 15m 10.0.0.83 kube-node2 <none> <none> rook-ceph rook-ceph-osd-prepare-kube-master-p55sw 0/2 Completed 0 15m 10.0.0.81 kube-master <none> <none> rook-ceph rook-ceph-osd-prepare-kube-node1-q7scn 0/2 Completed 0 15m 10.0.0.82 kube-node1 <none> <none> rook-ceph rook-ceph-osd-prepare-kube-node2-8rm4d 0/2 Completed 0 15m 10.0.0.83 kube-node2 <none> <none> rook-ceph rook-ceph-tools 1/1 Running 0 3m24s 10.244.1.9 kube-node2 <none> <none>
# kubectl -n rook-ceph exec -it rook-ceph-tools -- /bin/bash bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory [root@rook-ceph-tools /]# ceph status cluster: id: b58f2a5c-2fc7-43e7-b410-2d541e78a90e health: HEALTH_OK services: mon: 3 daemons, quorum a,b,c mgr: a(active) osd: 3 osds: 3 up, 3 in data: pools: 0 pools, 0 pgs objects: 0 objects, 0 B usage: 57 GiB used, 49 GiB / 105 GiB avail pgs: [root@rook-ceph-tools /]# exit exit
At this point we have a fully operational cluster but is it really using the networks for OSD public and private traffic? Lets explore that a bit further by first running the netstat command on any node in the cluster that has an OSD pod running. Since my cluster is small I will show all 3 nodes below:
[root@kube-master]# netstat -tulpn | grep LISTEN | grep osd tcp 0 0 192.168.100.81:6800 0.0.0.0:* LISTEN 29719/ceph-osd tcp 0 0 192.168.200.81:6800 0.0.0.0:* LISTEN 29719/ceph-osd tcp 0 0 192.168.200.81:6801 0.0.0.0:* LISTEN 29719/ceph-osd tcp 0 0 192.168.100.81:6801 0.0.0.0:* LISTEN 29719/ceph-osd [root@kube-node1]# netstat -tulpn | grep LISTEN | grep osd tcp 0 0 192.168.100.82:6800 0.0.0.0:* LISTEN 18770/ceph-osd tcp 0 0 192.168.100.82:6801 0.0.0.0:* LISTEN 18770/ceph-osd tcp 0 0 192.168.200.82:6801 0.0.0.0:* LISTEN 18770/ceph-osd tcp 0 0 192.168.200.82:6802 0.0.0.0:* LISTEN 18770/ceph-osd [root@kube-node2]# netstat -tulpn | grep LISTEN | grep osd tcp 0 0 192.168.100.83:6800 0.0.0.0:* LISTEN 22659/ceph-osd tcp 0 0 192.168.200.83:6800 0.0.0.0:* LISTEN 22659/ceph-osd tcp 0 0 192.168.200.83:6801 0.0.0.0:* LISTEN 22659/ceph-osd tcp 0 0 192.168.100.83:6801 0.0.0.0:* LISTEN 22659/ceph-osd
From the above we should see the OSD processes listening on the corresponding public and private networks we configured in the configmap. However lets further confirm by going back into the toolbox and doing a ceph osd dump:
# kubectl -n rook-ceph exec -it rook-ceph-tools -- /bin/bash bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory [root@rook-ceph-tools]# ceph osd dump epoch 14 fsid 05a8b767-e3e8-42aa-b792-69f479c807f7 created 2019-04-02 13:24:24.549423 modified 2019-04-02 13:25:28.441850 flags sortbitwise,recovery_deletes,purged_snapdirs crush_version 7 full_ratio 0.95 backfillfull_ratio 0.9 nearfull_ratio 0.85 require_min_compat_client jewel min_compat_client firefly require_osd_release mimic max_osd 3 osd.0 up in weight 1 up_from 11 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.200.81:6800/29719 192.168.100.81:6800/29719 192.168.100.81:6801/29719 192.168.200.81:6801/29719 exists,up 2feb0edf-6652-4148-8264-6ba52d04ff80 osd.1 up in weight 1 up_from 14 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.200.82:6801/18770 192.168.100.82:6800/18770 192.168.100.82:6801/18770 192.168.200.82:6802/18770 exists,up f8df61b4-4ac8-4705-9f97-eb09a1cc0d6c osd.2 up in weight 1 up_from 14 up_thru 0 down_at 0 last_clean_interval [0,0) 192.168.200.83:6800/22659 192.168.100.83:6800/22659 192.168.100.83:6801/22659 192.168.200.83:6801/22659 exists,up db555c80-9d81-4662-aed9-4bce1c0d5d78
As you can see it can be fairly straight forward to configure Rook to deploy a Ceph cluster using segmented networks to ensure the replication traffic runs on dedicated network and does not interfere with public client performance. Hopefully this quick demonstrate showed that.