Friday, December 31, 2021

Alternate Appliance Troubleshooting

 


Normally I would not document about an appliance problem.  After all I have replaced quite a few components across a wide array of appliances including a stop clutch in a Whirlpool washing machine.  However this latest experience was one that I felt needed better documentation given that the symptoms can sometimes be confused with those of other components and one might replace those first which can lead to a lot of extra cost without results.  Before we dive into the symptoms and fix though, lets introduce the appliance in question.  In my case it was a Whirlpool Gold Series Dishwasher (WDF750SAYM3) however the following will most likely apply to any Whirlpool dishwasher.

The problem started a few months ago with a undissolved soap packet after a completed cycle.  I didn't think much of it and carried on.  However then on another cycle I never heard the water spraying inside the dishwasher.   The washer would fill and drain but never engage the spraying of the water to actually wash the dishes.   At this point I was starting to wonder what was going on so I did a little research and found how to do a diagnostic run cycle on the dishwasher.  This involved by pressing any 3 (three) keys in the 1-2-3-1-2-3-1-2-3 sequence except Start, Delay,  or Cancel  and making sure the delay between key presses is not more than 1 sec.  If a problem is found, the dishwasher may display an error code by flashing the clean button in two sequences.  The first sequence will flash the clean led multiple times and then pause and the second sequence will flash clean led multiple times.  By counting the flashes in both sequences I would get a two digit error code.  However upon running the diagnostics I only got a code showing the water was too cold which makes sense because the run from my hot water heater is quite far and unless I run the hot water at the sink the initial water will be cool. With the diagnostics not showing any issues I started to try to find an answer online.  Most of the information found though seemed to point to a bad spray pump or a controller board issue.   I did not think it was either of these those because on some days the dishwasher worked normally without any problems but then on other days it seemed more problematic.  That was when I stumbled across a post where it was indicated that on this particular model of Whirlpool dishwasher there was a bad latch design and the latch mechanism had no test in diagnostic mode.  I thought I might be onto something so I replaced the latch with a new redesigned part.  The dishwasher seemed to be working.

The success however was short lived and if anything I was seeing the pattern of failures starting to become more prevalent.  In observing the dishwasher I found that a run would fail if during the first fill the spraying action did not start before the water shutoff.  So I would hit Cancel and Start again and sometimes it would eventually work.   I also found that if the water was hot on the start the chances of a successful wash went up.  Again when the dishwasher would work it was just fine so I still was ruling out it was a spray pump issue or controller board issue.  If either were truly bad I would expect my dishes to come out dirty and when the dishwasher worked they were clean.

Again I went back to researching on the internet and came across a conversation about the turbidity sensor (sometimes referred to as OWI) in Whirlpool dishwashers.  So what does this sensor do?  As the soil level increases, the amount of transmitted light decreases. The turbidity sensor measures the amount of transmitted light to determine the turbidity of the wash water. These turbidity measurements are supplied to the dishwasher controller board, which makes decisions on how long to wash in all the cycles.  However this is only part of the story because this sensor also has a thermistor built into it as well which monitors water temperature.  The temperature monitoring is key because as I stated earlier my dishwasher seemed to have better success when the water was very hot coming into the dishwasher.

With my new found information I proceeded to test my turbidity sensor.  With the power supply to the dishwasher turned off, the turbidity sensor can be tested from the main controller board at the connection P12 from the wire at pin 1 to the wire at pin 3. The resistance should measure between 46KO to 52KO at room temperature.  My resistance however was not in specification so I knew I found the source of my problem.

I went ahead and ordered my replacement sensor and when it arrived I used the following video to guide me through replacing the sensor:


Once the sensor was replaced I needed to run another diagnostic since that is what Whirlpool recommends when replacing the turbidity sensor.  Once that was complete I tested out the dishwasher over the course of a few days running multiple loads per day.   Every cycle was successful so I could finally declare success.   I should note however that when I was replacing the sensor I noticed my water supply line was corroded and slightly leaking but I will save that story for another day.








Friday, December 17, 2021

ETCD: Where is my Memory?

 


A colleague recently approached me about some cyclical etcd memory usage on their OpenShift clusters.  The pattern appeared to be a “sawtooth” or “run and jump” pattern when looking at the etcd memory utilization graphs.  The pattern happened every two hours where over the course of the two hours memory usage would gradually increase and then roughly at the two hour mark would abruptly drop back down to a more baseline level before repeating.  My colleague wanted to understand why this behavior was occurring and what was causing the memory to be freed.  In order to answer this question we first need to explore a little more about etcd and what things impact memory utilization and allow for free pages to be returned.


Etcd  can be summarized as a distributed key-value data store in OpenShift designed to be highly available and strongly consistent for distributed systems. OpenShift uses etcd to store all of its persistent cluster data, such as configs and metadata, allowing OpenShift services to remain scalable and stateless.

Etcd’s datastore is built on top of a fork of BoltDB called BBoltDB. Bolt is a key-value store that writes its data into a single memory mapped file which enables the underlying operating system to handle how data is cached and how much of the file remains in memory.   The underlying data structure for Bolt is B+ tree consisting of 4kb pages that are allocated as they are needed.  It should be noted that Bolt is very good with sequential writes but weak with random writes.  This will make more sense further in this discussion.


Along with Bolt in etcd is a protocol called Raft which is a consensus algorithm that is designed to be easy to understand and provide a way to distribute a state machine across a cluster of distributed systems.  Consensus, which involves a simple majority of servers agreeing on values, can be thought of as a highly available replication log between the nodes running etcd in the OpenShift cluster.  Raft works by electing a leader and then forcing all write requests to go to the leader.  Changes are then replicated from the leader to the other nodes in the etcd cluster.  If by chance the leader node goes offline due to maintenance or failure Raft will hold another election for a leader.


Etcds uses multiversion concurrency control (MVCC) in order to handle concurrent operations from different clients.  This ties into the Raft protocol as each version in MVCC relates to an index in the Raft log.  Etcd manages changes by revisions and thus every transaction made to etcd is a new revision.  By keeping a history of revisions, etcd is able to provide the version history for specific keys.  These keys are then in turn associated with their revision numbers along with their new values.  It's this key writing scheme that enabled etcd to make all writes sequential which reduces reliability on Bolts weakness above at random writes.

As we discussed above, etcd use of revisions and key history enables useful features for a key or set of keys.  However, etcds revisions can grow very large on a cluster and consume a lot of memory and disk.  Even if a large number of keys are deleted from the etcd cluster the space will continue to grow since the prior history for those keys will still exist.   This is where the concept of compaction comes into play.   Compaction in etcd will drop all previous revisions smaller than the revision being compacted to.   These compactions are just deletions in Bolt but they do remove keys from memory which will free up memory.   However if those keys have also been written to disk the disk will not be freed up until a defrag which can reclaim the space.

Circling back to my colleague's problem, I initially thought maybe a compaction job every two hours was the cause of his “sawtooth” graph of memory usage.  However it was confirmed that his compaction job was configured to run every 5 minutes.  This obviously did not correlate to the behavior we were seeing in the graphs.

Then I recalled, besides storing configs and metadata, etcd also stores events from the cluster.  These events would be stored just like we described above in key value pairs and would have revisions.  Although events would most likely never have new revisions because each event would be a unique key value pair.  Now every cluster event has an event-ttl assigned to it.  The event-ttl is just like one would imagine, a time to live before the event is removed or aged out.  The thought was maybe we had a persisting grouping of events happening that would age out over the time frame pattern we were seeing in the memory usage.  However upon investigating further we found the event-ttl was set to three hours.  Given our pattern was at a two hour scenario we abandoned looking any further at that option.

Then as I was looking through documentation about etcd I recalled that Raft with all of its responsibilities in etcd also does a form of compaction.  If we recall from above I indicated Raft has a log which contains indexes which just happens to be memory resident.   In etcd there is a configuration option called snapshot-count which controls the number of applied Raft entries to hold in memory before compaction executes.  In versions of etcd before v.3.2 that count was 10k but in v3.2 or greater the value has been set to 100k so ten times the amount of entries.  When the snapshot count on the leader server is reached the snapshot data is persisted to disk and then the old log is truncated.  If a slow follower requests logs before a compacted index is complete the leader will send a snapshot for the follower to just overwrite its state.   This was exactly the explanation for the behavior we were seeing.

Hopefully this walk through provided some details on how etcd works and how memory is impacted on a running cluster.  To read further on any of the topics feel free to explore these links:

Thursday, December 02, 2021

The Lowdown on Downward API in OpenShift

 


A customer approached me recently with a use case where they needed to have the OpenShift container know the hostname of the node it was running on.  They had found that the normal hostname file on Red Hat CoreOS was not on the node so they were not certain how they could derive the hostname value when they launched the custom daemonset they built.  Enter the downward API in OpenShift.

The downward API is a implementation that allows containers to consume information about API objects without integrating via the OpenShift API. Such information includes items like the pod’s name, namespace, and resource values. Containers can consume information from the downward API using environment variables or a volume file.

Lets go ahead and demonstrate the capabilities of the downward API with a simple example of how it can be used.  First lets create the following downward-secret.yaml file which will be used in our demonstration.  The secret file is just a basic secret nothing exciting:

$ cat << EOF > downward-secret.yaml
apiVersion: v1
kind: Secret
metadata:
  name: downwardsecret
data:
  password: cGFzc3dvcmQ=
  username: ZGV2ZWxvcGVy
type: kubernetes.io/basic-auth
EOF

Now lets create the secret on the OpenShift cluster:

$ oc create -f downward-secret.yaml
secret/downwardsecret created

Next lets create the following downward-pod.yaml file:

$ cat << EOF > downward-pod.yaml
apiVersion: v1
kind: Pod
metadata:
  name: downward-pod
spec:
  containers:
    - name: busybox-container
      image: k8s.gcr.io/busybox
      command: [ "sh", "-c"]
      args:
      - while true; do
          echo -en '\n';
          printenv NODENAME HOSTIP SERVICEACCT NAMESPACE;
          printenv DOWNWARD_SECRET;
          sleep 10;
        done;
      resources:
        requests:
          memory: "32Mi"
          cpu: "125m"
        limits:
          memory: "64Mi"
          cpu: "250m"
      volumeMounts:
        - name: downwardinfo
          mountPath: /etc/downwardinfo
          readOnly: false
          
      env:
        - name: NODENAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: SERVICEACCT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: DOWNWARD_SECRET
          valueFrom:
            secretKeyRef:
              name: downwardsecret
              key: username
  volumes:
    - name: downwardinfo
      downwardAPI:
        items:
          - path: "cpu_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.cpu
          - path: "cpu_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.cpu
          - path: "mem_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.memory
          - path: "mem_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.memory
EOF

Lets quickly take a look at the contents of that file which will create a pod called downward-pod and inside run a container called busybox-container using the busybox image:

$ cat downward-pod.yaml 
apiVersion: v1
kind: Pod
metadata:
  name: downward-pod
spec:
  containers:
    - name: busybox-container
      image: k8s.gcr.io/busybox
      command: [ "sh", "-c"]
      args:
      - while true; do
          echo -en '\n';
          printenv NODENAME HOSTIP SERVICEACCT NAMESPACE;
          printenv DOWNWARD_SECRET;
          sleep 10;
        done;
      resources:
        requests:
          memory: "32Mi"
          cpu: "125m"
        limits:
          memory: "64Mi"
          cpu: "250m"
      volumeMounts:
        - name: downwardinfo
          mountPath: /etc/downwardinfo
          readOnly: false
          
      env:
        - name: NODENAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: SERVICEACCT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: DOWNWARD_SECRET
          valueFrom:
            secretKeyRef:
              name: downwardsecret
              key: username
  volumes:
    - name: downwardinfo
      downwardAPI:
        items:
          - path: "cpu_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.cpu
          - path: "cpu_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.cpu
          - path: "mem_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.memory
          - path: "mem_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.memory


Under the container section we also defined some resources and added a volume mount. The volume mount will be used to mount up our downward api volume files which will consist of the resources we defined.  Those files will get mounted under the path /etc/downwardinfo inside the container:

      resources:
        requests:
          memory: "32Mi"
          cpu: "125m"
        limits:
          memory: "64Mi"
          cpu: "250m"
      volumeMounts:
        - name: downwardinfo
          mountPath: /etc/downwardinfo
          readOnly: false

Next there is a section where we defined some environment variables that reference some additional downward API values.  There is also a variable that references the downwardsecret.  All of these variables will get passed into the container to be consumed by whatever processes require them:

        env:
        - name: NODENAME
          valueFrom:
            fieldRef:
              fieldPath: spec.nodeName
        - name: HOSTIP
          valueFrom:
            fieldRef:
              fieldPath: status.hostIP
        - name: SERVICEACCT
          valueFrom:
            fieldRef:
              fieldPath: spec.serviceAccountName
        - name: NAMESPACE
          valueFrom:
            fieldRef:
              fieldPath: metadata.namespace
        - name: DOWNWARD_SECRET
          valueFrom:
            secretKeyRef:
              name: downwardsecret
              key: username

And finally there is a volumes section which defines the filename and the resource value field for the downwardinfo files that we want to pass into the container:

  volumes:
    - name: downwardinfo
      downwardAPI:
        items:
          - path: "cpu_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.cpu
          - path: "cpu_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.cpu
          - path: "mem_limit"
            resourceFieldRef:
              containerName: busybox-container
              resource: limits.memory
          - path: "mem_request"
            resourceFieldRef:
              containerName: busybox-container
              resource: requests.memory


Now that we have an idea of what the downward-pod.yaml does lets go ahead and run the pod:

$ oc create -f downward-pod.yaml 
pod/downward-pod created
$ oc get pod
NAME           READY   STATUS    RESTARTS   AGE
downward-pod   1/1     Running   0          6s

With the pod running we can now validate that the downward API variables and volume files we set.  First lets just look at the pod log and see if the variables we defined and printed in our argument loop show the right values:

$ oc logs downward-pod

master-0.kni20.schmaustech.com
192.168.0.210
default
default
developer

master-0.kni20.schmaustech.com
192.168.0.210
default
default
developer


The variables look to be populated correctly with the right hostname, host IP address, namespace and serviceaccount.   Even the username for our secret is showing up correctly as developer.   Since that looks correct lets move on and execute a shell in the pod:

$ oc exec -it downward-pod sh
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
/ # 

Once inside lets print the environment out and see if our variables are listed there as well:

/ # printenv
KUBERNETES_PORT=tcp://172.30.0.1:443
KUBERNETES_SERVICE_PORT=443
HOSTNAME=downward-pod
SHLVL=1
HOME=/root
TERM=xterm
KUBERNETES_PORT_443_TCP_ADDR=172.30.0.1
PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin
KUBERNETES_PORT_443_TCP_PORT=443
KUBERNETES_PORT_443_TCP_PROTO=tcp
HOSTIP=192.168.0.210
DOWNWARD_SECRET=developer
NAMESPACE=default
KUBERNETES_PORT_443_TCP=tcp://172.30.0.1:443
KUBERNETES_SERVICE_PORT_HTTPS=443
PWD=/
KUBERNETES_SERVICE_HOST=172.30.0.1
SERVICEACCT=default
NSS_SDB_USE_CACHE=no
NODENAME=master-0.kni20.schmaustech.com

Again the environment variables we defined are showing up and could be consumed by a process within the container. 

Now lets explore our volume files and confirm they too were set.   We can see the /etc/downwardinfo directory and four files exist:

/ # ls /etc/downwardinfo
cpu_limit    cpu_request  mem_limit    mem_request

Lets look at the contents of the four files:

/ # echo "$(cat /etc/downwardinfo/cpu_limit)"
1
/ # echo "$(cat /etc/downwardinfo/cpu_request)"
1
/ # echo "$(cat /etc/downwardinfo/mem_limit)"
67108864
/ # echo "$(cat /etc/downwardinfo/mem_request)"
33554432


The values in the files look correct and correspond to the resource values we defined in the downward-pod.yaml file that launched this pod.

At this point we have validated that the downward API does indeed provide information into the pod and can present itself either as an environment variable for a volume file.  So if anyone every asks how to get the hostname of the node the pod is running on as an environment variable inside the pod just keep the downward API in mind.

Thursday, October 28, 2021

Cluster Infrastructure Management with Red Hat Advanced Cluster Management for Kubernetes

 


In Red Hat Advanced Cluster Management for Kubernetes 2.4 there is a new component in technology preview called central infrastructure management.   This component allows a separate interface for an infrastructure administrator and a cluster creator.  From the infrastructure admin perspective it allows for management of on-premise compute resources across different data centers and/or locations.  Meanwhile once those compute resources have been identified it allows the cluster creators who might be part of a different Dev/Ops team to consume and allocate those resources for new OpenShift clusters.  The following video demonstrates a walk through on what that process looks like:



Monday, September 20, 2021

Deploy Single Node OpenShift via OpenShift Installer on Nvidia Jetson AGX


In a previous blog I walked through a disconnected single node OpenShift deployment using the OpenShift installer.   In this blog I will use a lot of the same steps but instead of installing on an X86 system we will try our hand at installing on a Nvidia Jetson AGX which contains an Arm processor.

Before we begin lets cover what this blog already assumes exists as prerequisites:
  • Podman, the oc binary and the openshift-install binary already exist on the system
  • A disconnected registry is already configured and has the mirrored aarch64 contents of the images for a given OpenShift release.   
  • A physical Nvidia Jetson AGX with UEFI firmware and the ability to boot an ISO image from USB
  • DNS entries for basic baremetal IPI requirements exist. My environment is below:
master-0.kni7.schmaustech.com IN A 192.168.0.47
*.apps.kni7.schmaustech.com IN A 192.168.0.47
api.kni7.schmaustech.com IN A 192.168.0.47
api-int.kni7.schmaustech.com   IN A 192.168.0.47

First lets verify the version of OpenShift we will be deploying by looking at the output of the oc version and openshift-install version:


$ oc version
Client Version: 4.9.0-rc.1
$ ./openshift-install version
./openshift-install 4.9.0-rc.1
built from commit 6b4296b0df51096b4ff03e4ec4aeedeead3425ab
release image quay.io/openshift-release-dev/ocp-release@sha256:2cce76f4dc2400d3c374f76ac0aa4e481579fce293e732f0b27775b7218f2c8d
release architecture amd64

While it looks like we will be deploying a version of 4.9.0-rc.1.  We technically will be deploying a version 4.9.0-rc.2 for aarch64.   We will set an image override for aarch64/4.9.0-rc2 a little further in our process.  Before that though, ensure the disconnected registry being used has the images for 4.9.0-rc.2 mirrored.  If not use a procedure like I have used in one of my previous blogs to mirror the 4.9.0-rc.2 images.

Now lets pull down a few files we will need for our deployment iso.   We need to pull down both the coreos-installer and the rhcos live iso:

$ wget https://mirror.openshift.com/pub/openshift-v4/clients/coreos-installer/v0.8.0-3/coreos-installer
--2021-09-16 10:10:26--  https://mirror.openshift.com/pub/openshift-v4/clients/coreos-installer/v0.8.0-3/coreos-installer
Resolving mirror.openshift.com (mirror.openshift.com)... 54.172.173.155, 54.173.18.88
Connecting to mirror.openshift.com (mirror.openshift.com)|54.172.173.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7649968 (7.3M)
Saving to: ‘coreos-installer’

coreos-installer                                     100%[=====================================================================================================================>]   7.29M  8.83MB/s    in 0.8s    

2021-09-16 10:10:27 (8.83 MB/s) - ‘coreos-installer’ saved [7649968/7649968]

$ wget https://mirror.openshift.com/pub/openshift-v4/aarch64/dependencies/rhcos/pre-release/4.9.0-rc.2/rhcos-live.aarch64.iso
--2021-09-16 10:10:40--  https://mirror.openshift.com/pub/openshift-v4/aarch64/dependencies/rhcos/pre-release/4.9.0-rc.2/rhcos-live.aarch64.iso
Resolving mirror.openshift.com (mirror.openshift.com)... 54.172.173.155, 54.173.18.88
Connecting to mirror.openshift.com (mirror.openshift.com)|54.172.173.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1031798784 (984M) [application/octet-stream]
Saving to: ‘rhcos-live.aarch64.iso’

rhcos-live.aarch64.iso                   100%[=====================================================================================================================>] 984.00M  11.2MB/s    in 93s     

2021-09-16 10:12:13 (10.6 MB/s) - ‘rhcos-live.aarch64.iso’ saved [1031798784/1031798784]


Set the execution bit on the coreos-installer which is a utility to embed the ignition file we will generate:

$ chmod 755 coreos-installer

Lets go ahead now and create an install-config.yaml for our single node deployment.  Notice some of the differences in this install-config.yaml.  Specifically we have no worker nodes defined, one master node defined and then we have the BootstrapInPlace section which tells us to use the nvme0n1 device in the node.  We also have our imageContentSources which tells the installer to use the local registry mirror I have already preconfigured.

$ cat << EOF > install-config.yaml
apiVersion: v1beta4
baseDomain: schmaustech.com
metadata:
  name: kni7
networking:
  networkType: OpenShiftSDN
  machineCIDR: 192.168.0.0/24
compute:
- name: worker
  replicas: 0
controlPlane:
  name: master
  replicas: 1
platform:
  none: {}
BootstrapInPlace:
  InstallationDisk: /dev/nvme0n1
pullSecret: '{ "auths": { "rhel8-ocp-auto.schmaustech.com:5000": {"auth": "ZHVtbXk6ZHVtbXk=","email": "bschmaus@schmaustech.com" } } }'
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDP+5QkRCiuhsYItXj7DzLcOIs2RbCgpMzDtPlt/hfLnDkLGozYIFapMp+o4l+6ornbZ3L+hYE0T8SyvyYVWfm1XpPcVgUIW6qp7yfEyTSRhpGnoY74PD33FIf6BtU2HoFLWjQcE6OrQOF0wijI3fgL0jSzvAxvYoXU/huMx/kI2jBcWEq5cADRfvpeYXhVEJLrIIOepoAZE1syaPT7jQEoLDfvxrDZPKObCOI2vzLiAQXI7gK1uc9YDb6IEA/4Ik4eV2R1+VCgKhgk5RUqn69+8a1o783g1tChKuLwA4K9lyEAbFBwlHMctfNOLeC1w+bYpDXH/3GydcYfq79/18dVd+xEUlzzC+2/qycWG36C1MxUZa2fXvSRWLnpkLcxtIes4MikFeIr3jkJlFUzITigzvFrKa2IKaJzQ53WsE++LVnKJfcFNLtWfdEOZMowG/KtgzSSac/iVEJRM2YTIJsQsqhhI4PTrqVlUy/NwcXOFfUF/NkF2deeUZ21Cdn+bKZDKtFu2x+ujyAWZKNq570YaFT3a4TrL6WmE9kdHnJOXYR61Tiq/1fU+y0fv1d0f1cYr4+mNRCGIZoQOgJraF7/YluLB23INkJgtbah/0t1xzSsQ59gzFhRlLkW9gQDekj2tOGJmZIuYCnTXGiqXHnri2yAPexgRiaFjoM3GCpsWw== bschmaus@bschmaus.remote.csb'
imageContentSources:
- mirrors:
  - rhel8-ocp-auto.schmaustech.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - rhel8-ocp-auto.schmaustech.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
  MIIF7zCCA9egAwIBAgIUeecEs+U5psgJ0aFgc4Q5dGVrAFcwDQYJKoZIhvcNAQEL
  BQAwgYYxCzAJBgNVBAYTAlVTMRYwFAYDVQQIDA1Ob3J0aENhcm9saW5hMRAwDgYD
  VQQHDAdSYWxlaWdoMRAwDgYDVQQKDAdSZWQgSGF0MRIwEAYDVQQLDAlNYXJrZXRp
  bmcxJzAlBgNVBAMMHnJoZWw4LW9jcC1hdXRvLnNjaG1hdXN0ZWNoLmNvbTAeFw0y
  MTA2MDkxMDM5MDZaFw0yMjA2MDkxMDM5MDZaMIGGMQswCQYDVQQGEwJVUzEWMBQG
  A1UECAwNTm9ydGhDYXJvbGluYTEQMA4GA1UEBwwHUmFsZWlnaDEQMA4GA1UECgwH
  UmVkIEhhdDESMBAGA1UECwwJTWFya2V0aW5nMScwJQYDVQQDDB5yaGVsOC1vY3At
  YXV0by5zY2htYXVzdGVjaC5jb20wggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
  AoICAQC9exAg3Ie3N3mkrQKseyri1VP2IPTc+pUEiVCPisIQAhRUfHhPR1HT7EF7
  SwaxrWjpfh9aYBPDEF3uLFQvzDEJWCh5PF55jwn3aABFGKEhfVBKd+es6nXnYaCS
  8CgLS2qM9x4WiuZxrntfB16JrjP+CrTvlAbE4DIMlDQLgh8+hDw9VPlbzY+MI+WC
  cYues1Ne+JZ5dZcKmCZ3zrVToPjreWZUuhSygci2xIQZxwWNmTvAgi+CAiQZS7VF
  RmKjj2H/o/d3I+XSS2261I8aXCAw4/3vaM9aci0eHeEhLIMrhv86WycOjcYL1Z6R
  n55diwDTSyrTo/B4zsQbmYUc8rP+pR2fyRJEGFVJ4ejcj2ZF5EbgUKupyU2gh/qt
  QeYtJ+6uAr9S5iQIcq9qvD9nhAtm3DnBb065X4jVPl2YL4zsbOS1gjoa6dRbFuvu
  f3SdsbQRF/YJWY/7j6cUaueCQOlXZRNhbQQHdIdBWFObw0QyyYtI831ue1MHPG0C
  nsAriPOkRzBBq+BPmS9CqcRDGqh+nd9m9UPVDoBshwaziSqaIK2hvfCAVb3BPJES
  CXKuIaP2IRzTjse58aAzsRW3W+4e/v9fwAOaE8nS7i+v8wrqcFgJ489HnVq+kRNc
  VImv5dBKg2frzXs1PpnWkE4u2VJagKn9B2zva2miRQ+LyvLLDwIDAQABo1MwUTAd
  BgNVHQ4EFgQUbcE9mpTkOK2ypIrURf+xYR08OAAwHwYDVR0jBBgwFoAUbcE9mpTk
  OK2ypIrURf+xYR08OAAwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOC
  AgEANTjx04NoiIyw9DyvszwRdrSGPO3dy1gk3jh+Du6Dpqqku3Mwr2ktaSCimeZS
  4zY4S5mRCgZRwDKu19z0tMwbVDyzHPFJx+wqBpZKkD1FvOPKjKLewtiW2z8AP/kF
  gl5UUNuwvGhOizazbvd1faQ8jMYoZKifM8On6IpFgqXCx98/GOWvnjn2t8YkMN3x
  blKVm5N7eGy9LeiGRoiCJqcyfGqdAdg+Z+J94AHEZb3OxG8uHLrtmz0BF3A+8V2H
  hofYI0spx5y9OcPin2yLm9DeCwWAA7maqdImBG/QpQCjcPW3Yzz9VytIMajPdnvd
  vbJF5GZNj7ods1AykCCJjGy6n9WCf3a4VLnZWtUTbtz0nrIjJjsdlXZqby5BCF0G
  iqWbg0j8onl6kmbMAhssRTlvL8w90F1IK3Hk+lz0Qy8rqZX2ohObtEYGMIAOdFJ1
  iPQrbksXOBpZNtm1VAved41sYt1txS2WZQgfklIXOhNOu4r32ZGKas4EJml0l0wL
  2P65PkPEa7AOeqwP0y6eGoNG9qFSl+yArycZGWudp88977H6CcdkdEcQzmjg5+TD
  9GHm3drUYGqBJDvIemQaXfnwy9Gxx+oBDpXLXOuo+edK812zh/q7s2FELfH5ZieE
  Q3dIH8UGsnjYxv8G3O23cYKZ1U0iiu9QvPRFm0F8JuFZqLQ=
  -----END CERTIFICATE-----
EOF

Before we can create the ignition file from the install-config.yaml we need to set the image release override variable.  We do this because all of this work is currently done on a X86 host but we are trying to generate a ignition file for an aarch64 host.   To set the image release override we will simply curl the aarch64 4.9.0-rc.2 release text and grab the quay release line:

$ export OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE=$(curl -s https://mirror.openshift.com/pub/openshift-v4/aarch64/clients/ocp/4.9.0-rc.2/release.txt| grep 'Pull From: quay.io' | awk -F ' ' '{print $3}' | xargs)
$ echo $OPENSHIFT_INSTALL_RELEASE_IMAGE_OVERRIDE
quay.io/openshift-release-dev/ocp-release@sha256:edd47e590c6320b158a6a4894ca804618d3b1e774988c89cd988e8a841cb5f3c

Once we have the install-config.yaml and the image release override variable set we can use the openshift-install binary to generate a singe node openshift ignition config:

$ ./openshift-install --dir=./ create single-node-ignition-config
INFO Consuming Install Config from target directory 
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings 
WARNING Found override for release image. Please be warned, this is not advised 
INFO Single-Node-Ignition-Config created in: . and auth 
$ ls -lart
total 1017468
-rwxr-xr-x.  1 bschmaus bschmaus    7649968 Apr 27 00:49 coreos-installer
-rw-rw-r--.  1 bschmaus bschmaus 1031798784 Jul 22 13:10 rhcos-live.aarch64.iso
-rw-r--r--.  1 bschmaus bschmaus       3667 Sep 15 10:35 install-config.yaml.save
drwx------. 27 bschmaus bschmaus       8192 Sep 15 10:39 ..
drwxr-x---.  2 bschmaus bschmaus         50 Sep 15 10:45 auth
-rw-r-----.  1 bschmaus bschmaus     284253 Sep 15 10:45 bootstrap-in-place-for-live-iso.ign
-rw-r-----.  1 bschmaus bschmaus    1865601 Sep 15 10:45 .openshift_install_state.json
-rw-rw-r--.  1 bschmaus bschmaus     213442 Sep 15 10:45 .openshift_install.log
-rw-r-----.  1 bschmaus bschmaus         98 Sep 15 10:45 metadata.json
drwxrwxr-x.  3 bschmaus bschmaus        247 Sep 15 10:45 .

Now lets take that bootstrap-in-place-for-live-iso.ign config we generated and use the coreos-installer to embed it into the rhcos live iso image.  There will be no output upon completion so I usually echo the $? to confirm it ended with a good exit status.

$ ./coreos-installer iso ignition embed -fi bootstrap-in-place-for-live-iso.ign rhcos-live.aarch64.iso
$ echo $?
0

Now that the rhcos live iso image has the ignition file embedded we can write the image to a USB device: 

$ sudo dd if=./rhcos-live.aarch64.iso of=/dev/sda bs=8M status=progress oflag=direct
[sudo] password for bschmaus: 
948783104 bytes (949 MB, 905 MiB) copied, 216 s, 4.4 MB/s
113+1 records in
113+1 records out
948783104 bytes (949 MB, 905 MiB) copied, 215.922 s, 4.4 MB/s

Once the USB device is written take the USB and connect it to the Nvidia Jetson AGX and boot from it.  Keep in mind during the first boot of the Jetson I had to hit the ESC key to get access to the device manager to tell it to boot from the ISO.  Then once the system reboots again I had to go back into the device manager to boot from my NVMe device.  After that the system will boot from the NMVe until the next time I want to install from the ISO again.  This is more a Jetson nuance then OCP issue.

Once the system has rebooted the first time and if the ignition file was embedded without errors we should be able to login using the core user and associated key that was set in the install-config.yaml we used.   Once inside the node we should be able to use crictl ps to confirm containers are being started:

$ ssh core@192.168.0.47
Red Hat Enterprise Linux CoreOS 49.84.202109152147-0
  Part of OpenShift 4.9, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.9/architecture/architecture-rhcos.html

---
Last login: Fri Sep 17 20:26:28 2021 from 10.0.0.152
[core@master-0 ~]$ sudo crictl ps
CONTAINER           IMAGE                                                                                                                    CREATED              STATE               NAME                             ATTEMPT             POD ID
f022aab7d2bd2       4e462838cdd7a580f875714d898aa392db63aefa2201141eca41c49d976f0965                                                         3 seconds ago        Running             network-operator                 0                   b89d47c2e53c9
c65fe3bd5a27c       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:8ad7f6aa04f25db941d5364fe2826cc0ed8c78b0f6ecba2cff660fab2b9327c7   About a minute ago   Running             cluster-policy-controller        0                   0df63b1ad8da3
7c5ea2f9f3ce0       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:efec74e2c00bca3688268eca7a256d865935c73b0ad0da4d5a9ceb126411ee1e   About a minute ago   Running             kube-apiserver-insecure-readyz   0                   f19feea00d442
c8665a708e33c       055d6dcd87c13fc04afd196253127c33cd86e4e0202e6798ce5b7136de56b206                                                         About a minute ago   Running             kube-apiserver                   0                   f19feea00d442
af8c8be71a74f       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d496e5f28b2d9f9bb507eb6b2a0544e46f973720bc98511bf4d05e9c81dc07a   About a minute ago   Running             kube-controller-manager          0                   0df63b1ad8da3
5c7fc277712f9       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:0d496e5f28b2d9f9bb507eb6b2a0544e46f973720bc98511bf4d05e9c81dc07a   About a minute ago   Running             kube-scheduler                   0                   41d530f654838
98b0faec9e0cd       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:81262ae10274989475617ac492361c3bc8853304fb409057e75d94c3eba18e48   About a minute ago   Running             etcd                             0                   f553fa481d714
[core@master-0 ~]$ uname -a
Linux master-0.kni7.schmaustech.com 4.18.0-305.19.1.el8_4.aarch64 #1 SMP Mon Aug 30 07:17:58 EDT 2021 aarch64 aarch64 aarch64 GNU/Linux

Further once we have confirmed containers are starting we can also use the kubeconfig and show the node state:

$ export KUBECONFIG=~/ocp/auth/kubeconfig 
$ ./oc get nodes -o wide
NAME                           STATUS ROLES         AGE   VERSION                INTERNAL-IP    EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                  CONTAINER-RUNTIME
master-0.kni7.schmaustech.com  Ready  master,worker 12m   v1.22.0-rc.0+75ee307   192.168.0.47   <none>        Red Hat Enterprise Linux CoreOS 49.84.202109152147-0 (Ootpa)   4.18.0-305.19.1.el8_4.aarch64   cri-o://1.22.0-71.rhaos4.9.gitd54f8e1.el8
Now we can get the cluster operator states with the oc command to confirm when installation has completed.  If there are still False's under AVAILABLE then the installation is still progressing:

$ ./oc get co
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.9.0-rc.2   False       False         True       12m     OAuthServerRouteEndpointAccessibleControllerAvailable: route.route.openshift.io "oauth-openshift" not found...
baremetal                                  4.9.0-rc.2   True        False         False      34s     
cloud-controller-manager                   4.9.0-rc.2   True        False         False      31s     
cloud-credential                           4.9.0-rc.2   True        False         False      11m     
cluster-autoscaler                                                                                   
config-operator                            4.9.0-rc.2   True        False         False      12m     
console                                    4.9.0-rc.2   Unknown     False         False      8s      
csi-snapshot-controller                    4.9.0-rc.2   True        False         False      12m     
dns                                        4.9.0-rc.2   True        False         False      109s    
etcd                                       4.9.0-rc.2   True        False         False      6m20s   
image-registry                                                                                       
ingress                                                 Unknown     True          Unknown    15s     Not all ingress controllers are available.
insights                                   4.9.0-rc.2   True        True          False      32s     Initializing the operator
kube-apiserver                             4.9.0-rc.2   True        False         False      96s     
kube-controller-manager                    4.9.0-rc.2   True        False         False      5m3s    
kube-scheduler                             4.9.0-rc.2   True        False         False      6m14s   
kube-storage-version-migrator              4.9.0-rc.2   True        False         False      12m     
machine-api                                4.9.0-rc.2   True        False         False      1s      
machine-approver                           4.9.0-rc.2   True        False         False      55s     
machine-config                             4.9.0-rc.2   True        False         False      38s     
marketplace                                4.9.0-rc.2   True        False         False      11m     
monitoring                                              Unknown     True          Unknown    12m     Rolling out the stack.
network                                    4.9.0-rc.2   True        False         False      13m     
node-tuning                                4.9.0-rc.2   True        False         False      11s     
openshift-apiserver                        4.9.0-rc.2   False       False         False      105s    APIServerDeploymentAvailable: no apiserver.openshift-apiserver pods available on any node....
openshift-controller-manager               4.9.0-rc.2   True        False         False      75s     
openshift-samples                                                                                    
operator-lifecycle-manager                 4.9.0-rc.2   True        False         False      24s     
operator-lifecycle-manager-catalog         4.9.0-rc.2   True        True          False      21s     Deployed 0.18.3
operator-lifecycle-manager-packageserver                False       True          False      26s     
service-ca                                 4.9.0-rc.2   True        False         False      12m     
storage                                    4.9.0-rc.2   True        False         False      30s    

Finally though after about 30 - 60 minutes we can finally see our single node cluster has completed installation:

$ ./oc get co
NAME                                       VERSION      AVAILABLE   PROGRESSING   DEGRADED   SINCE   MESSAGE
authentication                             4.9.0-rc.2   True        False         False      5m3s    
baremetal                                  4.9.0-rc.2   True        False         False      8m24s   
cloud-controller-manager                   4.9.0-rc.2   True        False         False      8m21s   
cloud-credential                           4.9.0-rc.2   True        False         False      19m     
cluster-autoscaler                         4.9.0-rc.2   True        False         False      7m35s   
config-operator                            4.9.0-rc.2   True        False         False      20m     
console                                    4.9.0-rc.2   True        False         False      4m54s   
csi-snapshot-controller                    4.9.0-rc.2   True        False         False      19m     
dns                                        4.9.0-rc.2   True        False         False      9m39s   
etcd                                       4.9.0-rc.2   True        False         False      14m     
image-registry                             4.9.0-rc.2   True        False         False      4m52s   
ingress                                    4.9.0-rc.2   True        False         False      7m4s    
insights                                   4.9.0-rc.2   True        False         False      8m22s   
kube-apiserver                             4.9.0-rc.2   True        False         False      9m26s   
kube-controller-manager                    4.9.0-rc.2   True        False         False      12m     
kube-scheduler                             4.9.0-rc.2   True        False         False      14m     
kube-storage-version-migrator              4.9.0-rc.2   True        False         False      20m     
machine-api                                4.9.0-rc.2   True        False         False      7m51s   
machine-approver                           4.9.0-rc.2   True        False         False      8m45s   
machine-config                             4.9.0-rc.2   True        False         False      8m28s   
marketplace                                4.9.0-rc.2   True        False         False      19m     
monitoring                                 4.9.0-rc.2   True        False         False      2m24s   
network                                    4.9.0-rc.2   True        False         False      21m     
node-tuning                                4.9.0-rc.2   True        False         False      8m1s    
openshift-apiserver                        4.9.0-rc.2   True        False         False      5m9s    
openshift-controller-manager               4.9.0-rc.2   True        False         False      9m5s    
openshift-samples                          4.9.0-rc.2   True        False         False      6m57s   
operator-lifecycle-manager                 4.9.0-rc.2   True        False         False      8m14s   
operator-lifecycle-manager-catalog         4.9.0-rc.2   True        False         False      8m11s   
operator-lifecycle-manager-packageserver   4.9.0-rc.2   True        False         False      7m49s   
service-ca                                 4.9.0-rc.2   True        False         False      20m     
storage                                    4.9.0-rc.2   True        False         False      8m20s 

And from the web console:



Wednesday, September 15, 2021

Deploy Disconnected Single Node OpenShift via OpenShift Installer


Deploying a single node OpenShift via the Assisted Installer has made it very easy to stand up a one node cluster.  However this means having nodes that have connectivity to the internet.  But what if the environment is disconnected?   In the following blog I will show how one can use the openshift-install binary to deploy a single node OpenShift that is in a disconnected environment without the assisted installer.

Before we begin lets cover what this blog already assumes exists as prerequisites:
  • Podman, the oc binary and the openshift-install binary already exist on the system
  • A disconnected registry is already configured and has the mirrored contents of the images for a given OpenShift release.   
  • A physical baremetal node with the ability to boot an ISO image
  • DNS entries for basic baremetal IPI requirements exist. My environment is below:
master-0.kni20.schmaustech.com IN A 192.168.0.210
*.apps.kni20.schmaustech.com IN A 192.168.0.210
api.kni20.schmaustech.com IN A 192.168.0.210
api-int.kni20.schmaustech.com   IN A 192.168.0.210

First lets verify the version of OpenShift we will be deploying by looking at the output of the oc version and openshift-install version:


$ oc version
Client Version: 4.8.12
$ ./openshift-install version
./openshift-install 4.8.12
built from commit 450e95767d89f809cb1afe5a142e9c824a269de8
release image quay.io/openshift-release-dev/ocp-release@sha256:c3af995af7ee85e88c43c943e0a64c7066d90e77fafdabc7b22a095e4ea3c25a


Looks like we will be deploying a version of 4.8.12.   Ensure the disconnected registry being used has the images for 4.8.12 mirrored.  If not use procedure like I have used in one of my previous blogs to mirror the 4.8.12 images.

Now lets pull down a few files we will need for our deployment iso.   We need to pull down both the coreos-installer and the rhcos live iso:

$ wget https://mirror.openshift.com/pub/openshift-v4/clients/coreos-installer/v0.8.0-3/coreos-installer
--2021-09-15 10:10:26--  https://mirror.openshift.com/pub/openshift-v4/clients/coreos-installer/v0.8.0-3/coreos-installer
Resolving mirror.openshift.com (mirror.openshift.com)... 54.172.173.155, 54.173.18.88
Connecting to mirror.openshift.com (mirror.openshift.com)|54.172.173.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 7649968 (7.3M)
Saving to: ‘coreos-installer’

coreos-installer                                     100%[=====================================================================================================================>]   7.29M  8.83MB/s    in 0.8s    

2021-09-15 10:10:27 (8.83 MB/s) - ‘coreos-installer’ saved [7649968/7649968]

$ wget https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/4.8.2/rhcos-4.8.2-x86_64-live.x86_64.iso
--2021-09-15 10:10:40--  https://mirror.openshift.com/pub/openshift-v4/dependencies/rhcos/latest/4.8.2/rhcos-4.8.2-x86_64-live.x86_64.iso
Resolving mirror.openshift.com (mirror.openshift.com)... 54.172.173.155, 54.173.18.88
Connecting to mirror.openshift.com (mirror.openshift.com)|54.172.173.155|:443... connected.
HTTP request sent, awaiting response... 200 OK
Length: 1031798784 (984M) [application/octet-stream]
Saving to: ‘rhcos-4.8.2-x86_64-live.x86_64.iso’

rhcos-4.8.2-x86_64-live.x86_64.iso                   100%[=====================================================================================================================>] 984.00M  11.2MB/s    in 93s     

2021-09-15 10:12:13 (10.6 MB/s) - ‘rhcos-4.8.2-x86_64-live.x86_64.iso’ saved [1031798784/1031798784]


Set the execution bit on the coreos-installer which is a utility to embed the ignition file we will generate:

$ chmod 755 coreos-installer

Lets go ahead now and create an install-config.yaml for our single node deployment.  Notice some of the differences in this install-config.yaml.  Specifically we have no worker nodes defined, one master node defined and then we have the BootstrapInPlace section which tells us to use the sda disk in the node.  We also have our imageContentSources which tells the installer to use the registry mirror.

$ cat << EOF > install-config.yaml
apiVersion: v1beta4
baseDomain: schmaustech.com
metadata:
  name: kni20
networking:
  networkType: OVNKubernetes
  machineCIDR: 192.168.0.0/24
compute:
- name: worker
  replicas: 0
controlPlane:
  name: master
  replicas: 1
platform:
  none: {}
BootstrapInPlace:
  InstallationDisk: /dev/sda
pullSecret: '{ "auths": { "rhel8-ocp-auto.schmaustech.com:5000": {"auth": "ZHVtbXk6ZHVtbXk=","email": "bschmaus@schmaustech.com" } } }'
sshKey: 'ssh-rsa AAAAB3NzaC1yc2EAAAADAQABAAACAQDP+5QkRCiuhsYItXj7DzLcOIs2RbCgpMzDtPlt/hfLnDkLGozYIFapMp+o4l+6ornbZ3L+hYE0T8SyvyYVWfm1XpPcVgUIW6qp7yfEyTSRhpGnoY74PD33FIf6BtU2HoFLWjQcE6OrQOF0wijI3fgL0jSzvAxvYoXU/huMx/kI2jBcWEq5cADRfvpeYXhVEJLrIIOepoAZE1syaPT7jQEoLDfvxrDZPKObCOI2vzLiAQXI7gK1uc9YDb6IEA/4Ik4eV2R1+VCgKhgk5RUqn69+8a1o783g1tChKuLwA4K9lyEAbFBwlHMctfNOLeC1w+bYpDXH/3GydcYfq79/18dVd+xEUlzzC+2/qycWG36C1MxUZa2fXvSRWLnpkLcxtIes4MikFeIr3jkJlFUzITigzvFrKa2IKaJzQ53WsE++LVnKJfcFNLtWfdEOZMowG/KtgzSSac/iVEJRM2YTIJsQsqhhI4PTrqVlUy/NwcXOFfUF/NkF2deeUZ21Cdn+bKZDKtFu2x+ujyAWZKNq570YaFT3a4TrL6WmE9kdHnJOXYR61Tiq/1fU+y0fv1d0f1cYr4+mNRCGIZoQOgJraF7/YluLB23INkJgtbah/0t1xzSsQ59gzFhRlLkW9gQDekj2tOGJmZIuYCnTXGiqXHnri2yAPexgRiaFjoM3GCpsWw== bschmaus@bschmaus.remote.csb'
imageContentSources:
- mirrors:
  - rhel8-ocp-auto.schmaustech.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-release
- mirrors:
  - rhel8-ocp-auto.schmaustech.com:5000/ocp4/openshift4
  source: quay.io/openshift-release-dev/ocp-v4.0-art-dev
additionalTrustBundle: |
  -----BEGIN CERTIFICATE-----
  MIIF7zCCA9egAwIBAgIUeecEs+U5psgJ0aFgc4Q5dGVrAFcwDQYJKoZIhvcNAQEL
  BQAwgYYxCzAJBgNVBAYTAlVTMRYwFAYDVQQIDA1Ob3J0aENhcm9saW5hMRAwDgYD
  VQQHDAdSYWxlaWdoMRAwDgYDVQQKDAdSZWQgSGF0MRIwEAYDVQQLDAlNYXJrZXRp
  bmcxJzAlBgNVBAMMHnJoZWw4LW9jcC1hdXRvLnNjaG1hdXN0ZWNoLmNvbTAeFw0y
  MTA2MDkxMDM5MDZaFw0yMjA2MDkxMDM5MDZaMIGGMQswCQYDVQQGEwJVUzEWMBQG
  A1UECAwNTm9ydGhDYXJvbGluYTEQMA4GA1UEBwwHUmFsZWlnaDEQMA4GA1UECgwH
  UmVkIEhhdDESMBAGA1UECwwJTWFya2V0aW5nMScwJQYDVQQDDB5yaGVsOC1vY3At
  YXV0by5zY2htYXVzdGVjaC5jb20wggIiMA0GCSqGSIb3DQEBAQUAA4ICDwAwggIK
  AoICAQC9exAg3Ie3N3mkrQKseyri1VP2IPTc+pUEiVCPisIQAhRUfHhPR1HT7EF7
  SwaxrWjpfh9aYBPDEF3uLFQvzDEJWCh5PF55jwn3aABFGKEhfVBKd+es6nXnYaCS
  8CgLS2qM9x4WiuZxrntfB16JrjP+CrTvlAbE4DIMlDQLgh8+hDw9VPlbzY+MI+WC
  cYues1Ne+JZ5dZcKmCZ3zrVToPjreWZUuhSygci2xIQZxwWNmTvAgi+CAiQZS7VF
  RmKjj2H/o/d3I+XSS2261I8aXCAw4/3vaM9aci0eHeEhLIMrhv86WycOjcYL1Z6R
  n55diwDTSyrTo/B4zsQbmYUc8rP+pR2fyRJEGFVJ4ejcj2ZF5EbgUKupyU2gh/qt
  QeYtJ+6uAr9S5iQIcq9qvD9nhAtm3DnBb065X4jVPl2YL4zsbOS1gjoa6dRbFuvu
  f3SdsbQRF/YJWY/7j6cUaueCQOlXZRNhbQQHdIdBWFObw0QyyYtI831ue1MHPG0C
  nsAriPOkRzBBq+BPmS9CqcRDGqh+nd9m9UPVDoBshwaziSqaIK2hvfCAVb3BPJES
  CXKuIaP2IRzTjse58aAzsRW3W+4e/v9fwAOaE8nS7i+v8wrqcFgJ489HnVq+kRNc
  VImv5dBKg2frzXs1PpnWkE4u2VJagKn9B2zva2miRQ+LyvLLDwIDAQABo1MwUTAd
  BgNVHQ4EFgQUbcE9mpTkOK2ypIrURf+xYR08OAAwHwYDVR0jBBgwFoAUbcE9mpTk
  OK2ypIrURf+xYR08OAAwDwYDVR0TAQH/BAUwAwEB/zANBgkqhkiG9w0BAQsFAAOC
  AgEANTjx04NoiIyw9DyvszwRdrSGPO3dy1gk3jh+Du6Dpqqku3Mwr2ktaSCimeZS
  4zY4S5mRCgZRwDKu19z0tMwbVDyzHPFJx+wqBpZKkD1FvOPKjKLewtiW2z8AP/kF
  gl5UUNuwvGhOizazbvd1faQ8jMYoZKifM8On6IpFgqXCx98/GOWvnjn2t8YkMN3x
  blKVm5N7eGy9LeiGRoiCJqcyfGqdAdg+Z+J94AHEZb3OxG8uHLrtmz0BF3A+8V2H
  hofYI0spx5y9OcPin2yLm9DeCwWAA7maqdImBG/QpQCjcPW3Yzz9VytIMajPdnvd
  vbJF5GZNj7ods1AykCCJjGy6n9WCf3a4VLnZWtUTbtz0nrIjJjsdlXZqby5BCF0G
  iqWbg0j8onl6kmbMAhssRTlvL8w90F1IK3Hk+lz0Qy8rqZX2ohObtEYGMIAOdFJ1
  iPQrbksXOBpZNtm1VAved41sYt1txS2WZQgfklIXOhNOu4r32ZGKas4EJml0l0wL
  2P65PkPEa7AOeqwP0y6eGoNG9qFSl+yArycZGWudp88977H6CcdkdEcQzmjg5+TD
  9GHm3drUYGqBJDvIemQaXfnwy9Gxx+oBDpXLXOuo+edK812zh/q7s2FELfH5ZieE
  Q3dIH8UGsnjYxv8G3O23cYKZ1U0iiu9QvPRFm0F8JuFZqLQ=
  -----END CERTIFICATE-----
EOF

Once we have the install-config.yaml created lets use the openshift-install binary to generate a singe node openshift ignition config:

$ ~/openshift-install --dir=./ create single-node-ignition-config
INFO Consuming Install Config from target directory 
WARNING Making control-plane schedulable by setting MastersSchedulable to true for Scheduler cluster settings 
INFO Single-Node-Ignition-Config created in: . and auth 
$ ls -lart
total 1017468
-rwxr-xr-x.  1 bschmaus bschmaus    7649968 Apr 27 00:49 coreos-installer
-rw-rw-r--.  1 bschmaus bschmaus 1031798784 Jul 22 13:10 rhcos-4.8.2-x86_64-live.x86_64.iso
-rw-r--r--.  1 bschmaus bschmaus       3667 Sep 15 10:35 install-config.yaml.save
drwx------. 27 bschmaus bschmaus       8192 Sep 15 10:39 ..
drwxr-x---.  2 bschmaus bschmaus         50 Sep 15 10:45 auth
-rw-r-----.  1 bschmaus bschmaus     284253 Sep 15 10:45 bootstrap-in-place-for-live-iso.ign
-rw-r-----.  1 bschmaus bschmaus    1865601 Sep 15 10:45 .openshift_install_state.json
-rw-rw-r--.  1 bschmaus bschmaus     213442 Sep 15 10:45 .openshift_install.log
-rw-r-----.  1 bschmaus bschmaus         98 Sep 15 10:45 metadata.json
drwxrwxr-x.  3 bschmaus bschmaus        247 Sep 15 10:45 .


Now lets take that bootstrap-in-place-for-live-iso.ign config we generated and use the coreos-installer to embed it into the rhcos live iso image.  There will be no output upon completion so I usually echo the $? to confirm it ended with a good exit status.

$ ./coreos-installer iso ignition embed -fi bootstrap-in-place-for-live-iso.ign rhcos-4.8.2-x86_64-live.x86_64.iso
$ echo $?
0

Since I am using a virtual machine as my single node openshift node I need to copy the boot iso over to my hypervisor host.  If this were a real baremetal server like Dell one might mount the iso image via virtual media or as another method write the iso to a USB device and physically plug it into the node being used for this singe node deployment.

$ scp rhcos-4.8.2-x86_64-live.x86_64.iso root@192.168.0.20:/var/lib/libvirt/images/
root@192.168.0.20's password: 
rhcos-4.8.2-x86_64-live.x86_64.iso                                                                                                                                               100%  984MB  86.0MB/s   00:11 

Once I have the live iso over on my hypervisor host I will use Virt-Manager to set the cdrom to boot from the live iso:

Next I will start the virtual machine.  If using a physical host power on the node.  The screen should be similar:









Once the virtual machine has booted we will see the console and login prompt.  After a few minutes the machine will reboot.


If the ignition file was embedded without errors we should be able to login using the core user and associated key that was set in the install-config.yaml we used.   Once inside the node we should be able to use crictl ps to confirm containers are being started:

$ ssh core@192.168.0.210
The authenticity of host '192.168.0.210 (192.168.0.210)' can't be established.
ECDSA key fingerprint is SHA256:B24X/7PH3+kGWwmUKPc/E+2Rg3YYsmYHISCOHfbGthg.
Are you sure you want to continue connecting (yes/no/[fingerprint])? yes
Warning: Permanently added '192.168.0.210' (ECDSA) to the list of known hosts.
Red Hat Enterprise Linux CoreOS 48.84.202109100857-0
  Part of OpenShift 4.8, RHCOS is a Kubernetes native operating system
  managed by the Machine Config Operator (`clusteroperator/machine-config`).

WARNING: Direct SSH access to machines is not recommended; instead,
make configuration changes via `machineconfig` objects:
  https://docs.openshift.com/container-platform/4.8/architecture/architecture-rhcos.html

---
[core@master-0 ~]$ sudo crictl ps
CONTAINER           IMAGE                                                                                                                    CREATED              STATE               NAME                                 ATTEMPT             POD ID
a3792d71875ab       aeee3c4eb8828bef375fa5f81bf524e84d12a0264c126b0f97703a3e5ebc06a8                                                         17 seconds ago       Running             sbdb                                 0                   4de60fd9cc622
733326d7246f8       dfd1e2430556eb4a9de83031a82c62c06debca6095dd63553ed38bd486374ac8                                                         17 seconds ago       Running             kube-rbac-proxy                      0                   4de60fd9cc622
7df7efd52c7f9       de195e3670ad1b3dd892d5a289aa83ce12122001faf02a56facb8fa4720ceaa3                                                         44 seconds ago       Running             kube-multus-additional-cni-plugins   0                   aab58f11b1f0a
ce602f830cb44       aeee3c4eb8828bef375fa5f81bf524e84d12a0264c126b0f97703a3e5ebc06a8                                                         48 seconds ago       Running             ovnkube-node                         0                   f0fea8120b806
d17912e8c762d       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:7b7edfdb1dd3510c1a8d74144ae89fbe61a28f519781088ead1cb5e560158657   48 seconds ago       Running             kube-rbac-proxy                      0                   f0fea8120b806
f6cf9e739714e       aeee3c4eb8828bef375fa5f81bf524e84d12a0264c126b0f97703a3e5ebc06a8                                                         49 seconds ago       Running             ovn-acl-logging                      0                   f0fea8120b806
232e663c0b190       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:03dc4dd87f6e52ad54718f31de9edfc763ce5a001d5bdff6c95fe85275fb64de   49 seconds ago       Running             northd                               0                   4de60fd9cc622
7b4b432b988d8       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:03dc4dd87f6e52ad54718f31de9edfc763ce5a001d5bdff6c95fe85275fb64de   49 seconds ago       Running             ovn-controller                       0                   f0fea8120b806
5596f6644e1bb       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:1fec937521df496277f7f934c079ebf48baccd8f76a5bfcc793e7c441976e6b5   About a minute ago   Running             kube-multus                          0                   7f4536275fb42
51b1c4da641f4       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:70ffc0ed147222ab1bea6207af5415f11450c86a9de2979285ba1324f6e904c2   About a minute ago   Running             network-operator                     0                   ea0f3c0bb9567
b4b46f8f5de1c       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:66fa2d7a5b2be88b76b5a8fa6f330bc64b57ce0fa9b8ea29e96a4c77df90f7cd   2 minutes ago        Running             kube-apiserver-insecure-readyz       0                   e3a4d81e4e99a
e49ce4745cefd       c7dbf8655b94a464b0aa15734fbd887bec8cdda46bbb3580954bf36961b4ac78                                                         2 minutes ago        Running             kube-controller-manager              1                   3cbc2d942afd8
7bd9f40dd40a3       c7dbf8655b94a464b0aa15734fbd887bec8cdda46bbb3580954bf36961b4ac78                                                         2 minutes ago        Running             kube-apiserver                       0                   e3a4d81e4e99a
e319800865018       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:80d0fcaf10fd289e31383062293cadb91ca6f7852a82f864c088679905f67859   2 minutes ago        Running             cluster-policy-controller            0                   3cbc2d942afd8
d1e26854fc700       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:e9de94a775df9cd6f86712410794393aa58f07374f294ba5a7b503f9fb23cf42   2 minutes ago        Running             kube-scheduler                       0                   0ae8507e3280a
e95cef37125c4       quay.io/openshift-release-dev/ocp-v4.0-art-dev@sha256:622d9bb3fe4e540054f54ec260a7e3e4f16892260658dbe32ee4750c27a94158   2 minutes ago        Running             etcd                                 0                   dcd694d4f9317
[core@master-0 ~]$ 


Further once we have confirmed containers are starting we can also use the kubeconfig and show the node state:

$ export KUBECONFIG=./auth/kubeconfig 
$ oc get nodes
NAME                             STATUS   ROLES           AGE   VERSION
master-0.kni20.schmaustech.com   Ready    master,worker   21m   v1.21.1+d8043e1

Now we can get the cluster operator states with the oc command to confirm when installation has completed.  If there are still False's under AVAILABLE then the installation is still progressing:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.12    False       True          False      17m
baremetal                                  4.8.12    True        False         False      11m
cloud-credential                           4.8.12    True        False         False      3m37s
cluster-autoscaler                         4.8.12    True        False         False      11m
config-operator                            4.8.12    True        False         False      17m
console                                    4.8.12    False       True          False      7m35s
csi-snapshot-controller                    4.8.12    True        False         False      7m56s
dns                                        4.8.12    True        False         False      9m2s
etcd                                       4.8.12    True        False         False      12m
image-registry                             4.8.12    True        False         False      7m48s
ingress                                    4.8.12    True        False         False      8m53s
insights                                   4.8.12    True        False         False      12m
kube-apiserver                             4.8.12    True        True          False      7m53s
kube-controller-manager                    4.8.12    True        False         False      10m
kube-scheduler                             4.8.12    True        False         False      11m
kube-storage-version-migrator              4.8.12    True        False         False      17m
machine-api                                4.8.12    True        False         False      11m
machine-approver                           4.8.12    True        False         False      16m
machine-config                                                   True                     
marketplace                                4.8.12    True        False         False      16m
monitoring                                 4.8.12    True        False         False      6m18s
network                                    4.8.12    True        False         False      17m
node-tuning                                4.8.12    True        False         False      11m
openshift-apiserver                        4.8.12    True        False         False      7m45s
openshift-controller-manager               4.8.12    True        False         False      7m53s
openshift-samples                          4.8.12    True        False         False      8m
operator-lifecycle-manager                 4.8.12    True        False         False      17m
operator-lifecycle-manager-catalog         4.8.12    True        False         False      12m
operator-lifecycle-manager-packageserver   4.8.12    True        False         False      8m56s
service-ca                                 4.8.12    True        False         False      17m
storage                                    4.8.12    True        False         False      11m

Finally though after about 30 - 60 minutes we can finally see our single node cluster has completed installation:

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.12    True        False         False      6m55s
baremetal                                  4.8.12    True        False         False      19m
cloud-credential                           4.8.12    True        False         False      10m
cluster-autoscaler                         4.8.12    True        False         False      18m
config-operator                            4.8.12    True        False         False      24m
console                                    4.8.12    True        False         False      7m1s
csi-snapshot-controller                    4.8.12    True        False         False      15m
dns                                        4.8.12    True        False         False      16m
etcd                                       4.8.12    True        False         False      19m
image-registry                             4.8.12    True        False         False      15m
ingress                                    4.8.12    True        False         False      16m
insights                                   4.8.12    True        False         False      19m
kube-apiserver                             4.8.12    True        False         False      15m
kube-controller-manager                    4.8.12    True        False         False      18m
kube-scheduler                             4.8.12    True        False         False      18m
kube-storage-version-migrator              4.8.12    True        False         False      24m
machine-api                                4.8.12    True        False         False      19m
machine-approver                           4.8.12    True        False         False      24m
machine-config                             4.8.12    True        False         False      5m45s
marketplace                                4.8.12    True        False         False      24m
monitoring                                 4.8.12    True        False         False      13m
network                                    4.8.12    True        False         False      25m
node-tuning                                4.8.12    True        False         False      19m
openshift-apiserver                        4.8.12    True        False         False      15m
openshift-controller-manager               4.8.12    True        False         False      15m
openshift-samples                          4.8.12    True        False         False      15m
operator-lifecycle-manager                 4.8.12    True        False         False      24m
operator-lifecycle-manager-catalog         4.8.12    True        False         False      19m
operator-lifecycle-manager-packageserver   4.8.12    True        False         False      16m
service-ca                                 4.8.12    True        False         False      24m
storage                                    4.8.12    True        False         False      19m

Friday, August 06, 2021

Deploying Single Node OpenShift via Assisted Installer API

 


The Assisted Installer is a project to help simplify OpenShift Container Platform (OCP) installation for a number of different platforms, but focuses on bare metal deployments. The service provides validation and discovery of targeted hardware and greatly improves success rates of installations.  It can be accessed via Red Hat’s provided SaaS portal to deploy an OCP cluster either on baremetal or virtual machines.   

In this article however I want to demonstrate a Single Node OpenShift deployment without using the UI web interface and instead rely on the underlying REST API that drives the Assisted Installer.  This can be useful for automating the deployment of clusters without user intervention.

The first step to achieve this will be to obtain an OpenShift Cluster Manager API Token.  This token provides the ability to authenticate against your Red Hat OpenShift Cluster Manager account without the need of a username or password.

Place this token into a file called ocm-token:

$ echo "Token String From OCM API Token Link Above" > ~/ocm-token

Next lets set some variables that we will refer to throughout this deployment process:

export OFFLINE_ACCESS_TOKEN=$(cat ~/ocm-token)                                # Loading my token into a variable
export ASSISTED_SERVICE_API="api.openshift.com"                               # Setting the Assisted Installer API endpoint
export CLUSTER_VERSION="4.8"                                                  # OpenShift version
export CLUSTER_IMAGE="quay.io/openshift-release-dev/ocp-release:4.8.2-x86_64" # OpenShift Quay image version
export CLUSTER_NAME="kni1"                                                    # OpenShift cluster name
export CLUSTER_DOMAIN="schmaustech.com"                                       # Domain name where my cluster will be deployed
export CLUSTER_NET_TYPE="OVNKubernetes"                                       # Network type to deploy with OpenShift
export MACHINE_CIDR_NET="192.168.0.0/24"                                      # Machine CIDR network 
export SNO_STATICIP_NODE_NAME="master-0"                                      # Node name of my SNO node
export PULL_SECRET=$(cat ~/pull-secret.json | jq -R .)                        # Loading my pull-secret into variable
export CLUSTER_SSHKEY=$(cat ~/.ssh/id_rsa.pub)                                # Loading the public key into variable

With the primary variables set lets go ahead and create a deployment.json file.  This file will reference some of the variables we set previously and also have a few that are statically set.   The key one to notice in this deployment is the high_availability_mode.  Having that variable set to None ensures we are doing a Single Node OpenShift (SNO) deployment:

cat << EOF > ~/deployment.json
{
  "kind": "Cluster",
  "name": "$CLUSTER_NAME",
  "openshift_version": "$CLUSTER_VERSION",
  "ocp_release_image": "$CLUSTER_IMAGE",
  "base_dns_domain": "$CLUSTER_DOMAIN",
  "hyperthreading": "all",
  "user_managed_networking": true,
  "vip_dhcp_allocation": false,
  "high_availability_mode": "None",
  "hosts": [],
  "ssh_public_key": "$CLUSTER_SSHKEY",
  "pull_secret": $PULL_SECRET,
  "network_type": "OVNKubernetes"
}
EOF


Now that we have the deployment.json file created lets refresh our bearer token:

$ export TOKEN=$(curl \
--silent \
--data-urlencode "grant_type=refresh_token" \
--data-urlencode "client_id=cloud-services" \
--data-urlencode "refresh_token=${OFFLINE_ACCESS_TOKEN}" \
https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token | \
jq -r .access_token)

With the token refereshed lets go ahead and create our deployment via the assisted installer REST API using curl and a post command.   When the command completes the output will only be a cluster id with some quotes on it.  I used sed to clean off the quotes so we end up with just the UUID number.  Note that the cluster configuration has only been created at this point but not installed.

$ export CLUSTER_ID=$( curl -s -X POST "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters" \
  -d @./deployment.json \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  | jq '.id' )

$ export CLUSTER_ID=$( sed -e 's/^"//' -e 's/"$//' <<<"$CLUSTER_ID")
$ echo $CLUSTER_ID
e85fc7d5-f274-4359-acc5-48044fc67132

At this point we need to generate a discovery iso for the SNO node to be booted from.  However before we do that I wanted to make sure that my SNO node was using a static IP address instead of the default of DHCP.  To do this we need to create a data file that contains the information on how the static IP should be set.   NMState will take this information when applied to the OCP node during the installation.   Below we have defined some arguments that provide a mac interface map and a NMState yaml file.  All of this information gets pushed into the DATA variable which is just pointing to a temp file.

$ DATA=$(mktemp)
$ jq -n --arg SSH_KEY "$CLUSTER_SSHKEY" --arg NMSTATE_YAML1 "$(cat ~/sno-server.yaml)"  \
'{
  "ssh_public_key": $SSH_KEY,
  "image_type": "full-iso",
  "static_network_config": [
    {
      "network_yaml": $NMSTATE_YAML1,
      "mac_interface_map": [{"mac_address": "52:54:00:82:23:e2", "logical_nic_name": "ens9"}]
    }
  ]
}' >> $DATA

The sno-server.yaml used in the NMState argument looks like the following below.  It contains the IP address, mask, interface and route information.

$ cat ~/sno-server.yaml 
dns-resolver:
  config:
    server:
    - 192.168.0.10
interfaces:
- ipv4:
    address:
    - ip: 192.168.0.204
      prefix-length: 24
    dhcp: false
    enabled: true
  name: ens9
  state: up
  type: ethernet
routes:
  config:
  - destination: 0.0.0.0/0
    next-hop-address: 192.168.0.1
    next-hop-interface: ens9
    table-id: 254

We can confirm that the DATA was set appropriately by looking at the DATA variable and then cat out the tmp file it points to:

$ echo $DATA
/tmp/tmp.3Jqw7lU6Qf

$ cat /tmp/tmp.3Jqw7lU6Qf
{
  "ssh_public_key": "SSHKEY REDACTED",
  "image_type": "full-iso",
  "static_network_config": [
    {
      "network_yaml": "dns-resolver:\n  config:\n    server:\n    - 192.168.0.10\ninterfaces:\n- ipv4:\n    address:\n    - ip: 192.168.0.204\n      prefix-length: 24\n    dhcp: false\n    enabled: true\n  name: ens9\n  state: up\n  type: ethernet\nroutes:\n  config:\n  - destination: 0.0.0.0/0\n    next-hop-address: 192.168.0.1\n    next-hop-interface: ens9\n    table-id: 254",
      "mac_interface_map": [
        {
          "mac_address": "52:54:00:82:23:e2",
          "logical_nic_name": "ens9"
        }
      ]
    }
  ]
}

With the static IP configuration set we can go ahead and generate our discovery ISO with another curl post command.   The command will generate quite a bit of output but our main concern is visually seeing the section where the static network configuration gets defined:

$ curl -X POST \
"https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/image" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d @$DATA

(...)
"static_network_config":"dns-resolver:\n  config:\n    server:\n    - 192.168.0.10\ninterfaces:\n- ipv4:\n    address:\n    - ip: 192.168.0.204\n      prefix-length: 24\n    dhcp: false\n    enabled: true\n  name: ens9\n  state: up\n  type: ethernet\nroutes:\n  config:\n  - destination: 0.0.0.0/0\n    next-hop-address: 192.168.0.1\n    next-hop-interface: ens9\n    table-id: 254HHHHH52:54:00:82:23:e2=ens9","type":"full-iso"}
(...)

Now that the discovery image has been created lets go ahead and download that image:

$ curl -L \
  "http://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/image" \
  -o ~/discovery-image-$CLUSTER_NAME.iso \
  -H "Authorization: Bearer $TOKEN"
% Total    % Received % Xferd  Average Speed   Time    Time     Time  Current
                                 Dload  Upload   Total   Spent    Left  Speed
  0     0    0     0    0     0      0      0 --:--:-- --:--:-- --:--:--     0
100  984M  100  984M    0     0  10.4M      0  0:01:34  0:01:34 --:--:-- 10.5M


Now that the image is downloaded we can move it to where we need to boot the SNO node machine.  This node could be baremetal or in my case a virtual machine.   If it was baremetal for example, like a Dell, we might use racadm to do a virtual media mount and then ipmitool to power on the server.   In my case since I am using a virtual machine, I need to do a couple things.   First I copy the image over to my KVM hypervisor host.  Next I ensure the power is off on my virtual machine.  I can use ipmitool here because I am leveraging virtual BMC.  Next I use the virsh command to change the media to my ISO that I moved over.   I format the disk image on my virtual machine that way I do not have to mess around with boot order as the primary disk will be skipped because its empty and the CDROM will boot.  And finally I power on the host to initiate the discover phase.   At this point we have to wait for the node to boot up and report in what was discovered from an introspection perspective.  I usually wait 5 minutes before proceeding hence why I have the sleep command. 

$ scp ~/discovery-image-kni1.iso root@192.168.0.5:/slowdata/images/

$ /usr/bin/ipmitool -I lanplus -H192.168.0.10 -p6252 -Uadmin -Ppassword chassis power off

$ ssh root@192.168.0.5 "virsh change-media rhacm-master-0 hda /slowdata/images/discovery-image-kni1.iso"

$ ssh root@192.168.0.5 "virt-format --format=raw --partition=none -a /fastdata2/images/master-0.img"

$ /usr/bin/ipmitool -I lanplus -H192.168.0.10 -p6252 -Uadmin -Ppassword chassis power on

$ sleep 300


After 5 minutes the node should have reporting in to the Assisted Installer portal.   And inventory of the hardware of the machine and capabilities is provided in the portal.   We can now proceed with the deployment.

First though we need to ensure the hostname is set correctly.  With DHCP it was automatically being set but since we used a static IP I found I needed to set it manually.  To do this we will patch the installation and set the requested_hostname:

$ curl -X PATCH \
  "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d "{ \"requested_hostname\": \"$SNO_STATICIP_NODE_NAME.$CLUSTER_NAME.$CLUSTER_DOMAIN\"}" | jq

(...)
"requested_hostname": "master-0.kni1.schmaustech.com",
(...)

We also need to patch the machine network to the appropriate network:

$ curl -X PATCH \
  "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" \
  -d "{ \"machine_network_cidr\": \"$MACHINE_CIDR_NET\"}" | jq

(...)
"machine_network_cidr": "192.168.0.0/24",
(...)

Finally after all of the preparation we can finally run the curl post command that actually starts the installation process:

$ curl -X POST \
  "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID/actions/install" \
  -H "accept: application/json" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN" | jq

(...)
  "status": "preparing-for-installation",
  "status_info": "Preparing cluster for installation",
  "status_updated_at": "2021-08-06T20:56:17.565Z",
(...)


The installation process does take about 60 minutes or so minutes to complete so go grab lunch or a cup of coffee.

After 60 minutes or so we can check and see if the cluster is installed or still in progress.  The first thing we should do though is refresh our token again:

$ export TOKEN=$(curl \
--silent \
--data-urlencode "grant_type=refresh_token" \
--data-urlencode "client_id=cloud-services" \
--data-urlencode "refresh_token=${OFFLINE_ACCESS_TOKEN}" \
https://sso.redhat.com/auth/realms/redhat-external/protocol/openid-connect/token | \
jq -r .access_token)

After we have refreshed our token lets go ahead and confirm if indeed the cluster has finished installing.   We can achieve this by doing a curl get against the cluster ID.   There will be a lot of output but we are specifically looking for the status and status_info lines:

$ curl -s -X GET \
   -H "Content-Type: application/json" \
   -H "Authorization: Bearer $TOKEN" \
   "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID" | jq .

(...)
  "status": "installed",
  "status_info": "Cluster is installed",
  "status_updated_at": "2021-08-06T21:45:04.375Z",
(...)

From the output above my cluster has completed so now I can pull my kubeconfig down and redirect it to a file:

$ curl -s -X GET \
  "https://$ASSISTED_SERVICE_API/api/assisted-install/v1/clusters/$CLUSTER_ID/downloads/kubeconfig" > kubeconfig-kni1 \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer $TOKEN"

Now lets export the kubeconfig variable and look at the cluster with some oc commands:

$ export KUBECONFIG=~/kubeconfig-kni1

$ oc get nodes -o wide
NAME                            STATUS   ROLES           AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                       KERNEL-VERSION                 CONTAINER-RUNTIME
master-0.kni1.schmaustech.com   Ready    master,worker   156m   v1.21.1+051ac4f   192.168.0.204   none        Red Hat Enterprise Linux CoreOS 48.84.202107202156-0 (Ootpa)   4.18.0-305.10.2.el8_4.x86_64   cri-o://1.21.2-5.rhaos4.8.gitb27d974.el8

$ oc get co
NAME                                       VERSION   AVAILABLE   PROGRESSING   DEGRADED   SINCE
authentication                             4.8.2     True        False         False      134m
baremetal                                  4.8.2     True        False         False      144m
cloud-credential                           4.8.2     True        False         False      147m
cluster-autoscaler                         4.8.2     True        False         False      146m
config-operator                            4.8.2     True        False         False      151m
console                                    4.8.2     True        False         False      135m
csi-snapshot-controller                    4.8.2     True        False         False      147m
dns                                        4.8.2     True        False         False      144m
etcd                                       4.8.2     True        False         False      146m
image-registry                             4.8.2     True        False         False      141m
ingress                                    4.8.2     True        False         False      141m
insights                                   4.8.2     True        False         False      135m
kube-apiserver                             4.8.2     True        False         False      143m
kube-controller-manager                    4.8.2     True        False         False      143m
kube-scheduler                             4.8.2     True        False         False      143m
kube-storage-version-migrator              4.8.2     True        False         False      151m
machine-api                                4.8.2     True        False         False      146m
machine-approver                           4.8.2     True        False         False      147m
machine-config                             4.8.2     True        False         False      143m
marketplace                                4.8.2     True        False         False      144m
monitoring                                 4.8.2     True        False         False      138m
network                                    4.8.2     True        False         False      152m
node-tuning                                4.8.2     True        False         False      146m
openshift-apiserver                        4.8.2     True        False         False      143m
openshift-controller-manager               4.8.2     True        False         False      146m
openshift-samples                          4.8.2     True        False         False      142m
operator-lifecycle-manager                 4.8.2     True        False         False      144m
operator-lifecycle-manager-catalog         4.8.2     True        False         False      147m
operator-lifecycle-manager-packageserver   4.8.2     True        False         False      144m
service-ca                                 4.8.2     True        False         False      151m
storage                                    4.8.2     True        False         False      146m

Everything looks good with this example Single Node OpenShift installation!   If one is interested in pursuing more complex examples it might be worth looking at what is available with the Assisted Installer REST API.   To do that take a look at this swagger.yaml file and use it with the online Swagger Editor.